Hi,
Problem: Prodigy (v1.9.1) repeats the same task several times. In other words Prodigy proposes the same task more than one time during the annotation.
I executed Prodigy by:
prodigy textcat.manual test_dec24_single13 united_input_h5.jsonl --label "L1","L2" command.
Content of input dataset united_input_h5.jsonl
is:
{"text": "aaa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ", "meta": {"source": "www.example.com"}}
{"text": "aaa 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ", "meta": {"source": "www.example.com"}}
{"text": "aaa 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ", "meta": {"source": "www.example.com"}}
{"text": "aaa 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ", "meta": {"source": "www.example.com"}}
{"text": "aaa 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ", "meta": {"source": "www.example.com"}}
Content of the output dataset test_dec24_single13
(obtained by executing prodigy db-out test_dec24_single13
):
{"text":"aaa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ","meta":{"source":"www.example.com"},"_input_hash":1820239773,"_task_hash":-2103791298,"options":[{"id":"L1","text":"L1"},{"id":"L2","text":"L2"}],"_session_id":null,"_view_id":"choice","accept":["L1"],"answer":"accept"}
{"text":"aaa 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ","meta":{"source":"www.example.com"},"_input_hash":1084935059,"_task_hash":-1917208248,"options":[{"id":"L1","text":"L1"},{"id":"L2","text":"L2"}],"_session_id":null,"_view_id":"choice","accept":["L1"],"answer":"accept"}
{"text":"aaa 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ","meta":{"source":"www.example.com"},"_input_hash":-2050076717,"_task_hash":-48850187,"options":[{"id":"L1","text":"L1"},{"id":"L2","text":"L2"}],"_session_id":null,"_view_id":"choice","accept":["L1"],"answer":"accept"}
{"text":"aaa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ","meta":{"source":"www.example.com"},"_input_hash":1820239773,"_task_hash":-2103791298,"options":[{"id":"L1","text":"L1"},{"id":"L2","text":"L2"}],"_session_id":null,"_view_id":"choice","accept":["L2"],"answer":"accept"}
Here we see the problem: task with text "aaa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 " is repeated twice. Actually it was proposed twice by Prodigy to the annotator and was annotated by different classes (by "L1" in first case and by "L2" in second case)
Content of setting file prodigy.json
:
{ "db": "mysql", "db_settings": { "mysql": { "host": "my-cluster.cluster-XXX.eu-west-1.rds.amazonaws.com", "user": "user-name", "passwd": "some-password", "db": "prodigy" } }, "batch_size": 1, "host": "0.0.0.0", "show_stats": true, "show_flag": false, "instructions": "/home/yuri/.prodigy/instructions.html", "custom_theme": {"cardMaxWidth": 2000}, "largeText": 3, "mediumText": 3, "smallText": 3, "javascript":"prodigy.addEventListener('prodigyanswer', event => {const selected = event.detail.task.accept || []; if (!selected.length) {alert('Task with no selected options submitted.')}})",
"force_stream_order": true
}
Thank you!