Repeated tasks with force_stream order and low batch size

Hi,
Problem: Prodigy (v1.9.1) repeats the same task several times. In other words Prodigy proposes the same task more than one time during the annotation.

I executed Prodigy by:
prodigy textcat.manual test_dec24_single13 united_input_h5.jsonl --label "L1","L2" command.

Content of input dataset united_input_h5.jsonl is:

{"text": "aaa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ", "meta": {"source": "www.example.com"}}
{"text": "aaa 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ", "meta": {"source": "www.example.com"}}
{"text": "aaa 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ", "meta": {"source": "www.example.com"}}
{"text": "aaa 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ", "meta": {"source": "www.example.com"}}
{"text": "aaa 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ", "meta": {"source": "www.example.com"}}

Content of the output dataset test_dec24_single13 (obtained by executing prodigy db-out test_dec24_single13):

{"text":"aaa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ","meta":{"source":"www.example.com"},"_input_hash":1820239773,"_task_hash":-2103791298,"options":[{"id":"L1","text":"L1"},{"id":"L2","text":"L2"}],"_session_id":null,"_view_id":"choice","accept":["L1"],"answer":"accept"}
{"text":"aaa 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ","meta":{"source":"www.example.com"},"_input_hash":1084935059,"_task_hash":-1917208248,"options":[{"id":"L1","text":"L1"},{"id":"L2","text":"L2"}],"_session_id":null,"_view_id":"choice","accept":["L1"],"answer":"accept"}
{"text":"aaa 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ","meta":{"source":"www.example.com"},"_input_hash":-2050076717,"_task_hash":-48850187,"options":[{"id":"L1","text":"L1"},{"id":"L2","text":"L2"}],"_session_id":null,"_view_id":"choice","accept":["L1"],"answer":"accept"}
{"text":"aaa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ","meta":{"source":"www.example.com"},"_input_hash":1820239773,"_task_hash":-2103791298,"options":[{"id":"L1","text":"L1"},{"id":"L2","text":"L2"}],"_session_id":null,"_view_id":"choice","accept":["L2"],"answer":"accept"}

Here we see the problem: task with text "aaa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 " is repeated twice. Actually it was proposed twice by Prodigy to the annotator and was annotated by different classes (by "L1" in first case and by "L2" in second case)

Content of setting file prodigy.json:

{ "db": "mysql", "db_settings": { "mysql": { "host": "my-cluster.cluster-XXX.eu-west-1.rds.amazonaws.com", "user": "user-name", "passwd": "some-password", "db": "prodigy" } },  "batch_size": 1, "host": "0.0.0.0",  "show_stats": true, "show_flag": false, "instructions": "/home/yuri/.prodigy/instructions.html", "custom_theme": {"cardMaxWidth": 2000}, "largeText": 3, "mediumText": 3, "smallText": 3, "javascript":"prodigy.addEventListener('prodigyanswer', event => {const selected = event.detail.task.accept || []; if (!selected.length) {alert('Task with no selected options submitted.')}})",
"force_stream_order": true
}

Thank you!

Thanks for the detailed report :+1: My first guess is that this might be caused by how the new "force_stream_order": true setting plays together with a batch size of 1. With batches of only 1 example, the queue is always running low, so maybe this causes Prodigy to request the first batch (in this case, task) twice in the beginning, or something similar.

Thank you for a very quick response!
Changing "batch_size" to "2" did not help (same behavior was observed) while changing "force_stream_order" to "false" did help and no task repetitions were proposed.

Can it be considered a bug when "force_stream_order" equals "true" ?

A batch size of 2 is still low enough that the app is immediately asking for new questions once the first task is annotated – and possibly before the first answer is submitted. So I think what might be happening here is that the next batch of questions if requested from the server before Prodigy had a chance to register the answer that was just submitted. So it doesn't send that information back to the server and as a result, receives that first question again.

I was able to reproduce the problem with a batch size of 1 or 2, but it seems to work as expected with a batch size of 3 and above. So this is likely a bug that only occurs with very low batch sizes where submitting a single answer already triggers a request for more questions to fill up the queue.