I would like to know is it possible to customise the output of prodigy db-out
recipe for NER annotation dataset ? The default output has so many info including hash, accept, timestamp etc. For example the following example:
{"text":"hi good afternoon this is lily from a b c travel agency how can i help you }"
here the afternoon is TIME entity. The prodigy db-out
output is
{"text":"hi good afternoon this is lily from a b c travel agency how can i help you","_input_hash":162856980,"_task_hash":1109836302,"_is_binary":false,"tokens":[{"text":"hi","start":0,"end":2,"id":0,"ws":true},{"text":"good","start":3,"end":7,"id":1,"ws":true},{"text":"afternoon","start":8,"end":17,"id":2,"ws":true},{"text":"this","start":18,"end":22,"id":3,"ws":true},{"text":"is","start":23,"end":25,"id":4,"ws":true},{"text":"lily","start":26,"end":30,"id":5,"ws":true},{"text":"from","start":31,"end":35,"id":6,"ws":true},{"text":"a","start":36,"end":37,"id":7,"ws":true},{"text":"b","start":38,"end":39,"id":8,"ws":true},{"text":"c","start":40,"end":41,"id":9,"ws":true},{"text":"travel","start":42,"end":48,"id":10,"ws":true},{"text":"agency","start":49,"end":55,"id":11,"ws":true},{"text":"how","start":56,"end":59,"id":12,"ws":true},{"text":"can","start":60,"end":63,"id":13,"ws":true},{"text":"i","start":64,"end":65,"id":14,"ws":true},{"text":"help","start":66,"end":70,"id":15,"ws":true},{"text":"you","start":71,"end":74,"id":16,"ws":false}],"_view_id":"ner_manual","spans":[{"start":8,"end":17,"token_start":2,"token_end":2,"label":"TIME"},{"start":26,"end":30,"token_start":5,"token_end":5,"label":"PERSON"}],"answer":"accept","_timestamp":1715756520,"_annotator_id":"2024-05-15_15-01-26","_session_id":"2024-05-15_15-01-26"}
1.a ) Is it possible remove most of the info and get minimum info of the target Label name and staring Span of the label and ending Span. May be some basic info as well.
For example:
{"text":"hi good afternoon this is lily from a b c travel agency how can i help you","entities": [{"text": "afternoon", "label": "TIME"}]}
1.b) can it be done via Python script or other means?
Thanks for great tool Prodigy and your support! Any thoughts highly useful.
Cheers!
e101sg