Refreshing the page will jump the document offset in the stream far forward

Hi,
I am facing a strange behaviour using a custom recipe for binary classification with blocks.
When even an annotator is refreshing the page the stream offset jumps forward and skips between 1-X documents . Once it gets to the end it will show "No tasks available") and wont show those it skipped.
Can you help me understand how to solve the issue?
I am using the JSONL stream (out of the box) for my data. I am also using multi session ( with /?session=NAME) annotation and the feed_overlap=true.
I just found out that it jumps and start in the next refresh every 10 lines ( started with 0, went to 2,21,30,40,50,60...130 and then showed"no tasks available")
Thanks,
Keren

Hi! Are you setting "force_stream_order": true in your recipe? This should ensure that examples are always sent in the same order, and that a batch is re-sent if it hasn't been answered in a particular session.

By default, Prodigy will consider a batch that was sent out as out for annotation, and will wait for it to come back (because it can't know whether a session was discarded or not). So you can make multiple requests to the same session, and every annotator will receive a different batch. But this also means that when you refresh, you will get the next available batch and can only know whether a batch has come back or not when you restart the server and queue up all unannotated examples.

(Btw, the upcoming v1.11 wil include a refactor of the feeds logic that makes some of these parts more intuitive and includes handling of "work stealing", so you'll get the same batch again on refresh, while also allowing "abandoned" batches to be sent out again, e.g. if an annotator just stops annotating before saving their work.)

Setting "force_stream_order": true was the solution. Thanks a lot!

1 Like