Refreshing the page will jump the document offset in the stream far forward

kerenya · August 2, 2021, 11:43am

Hi,
I am facing a strange behaviour using a custom recipe for binary classification with blocks.
When even an annotator is refreshing the page the stream offset jumps forward and skips between 1-X documents . Once it gets to the end it will show "No tasks available") and wont show those it skipped.
Can you help me understand how to solve the issue?
I am using the JSONL stream (out of the box) for my data. I am also using multi session ( with /?session=NAME) annotation and the feed_overlap=true.
I just found out that it jumps and start in the next refresh every 10 lines ( started with 0, went to 2,21,30,40,50,60...130 and then showed"no tasks available")
Thanks,
Keren

ines · August 5, 2021, 12:04am

Hi! Are you setting "force_stream_order": true in your recipe? This should ensure that examples are always sent in the same order, and that a batch is re-sent if it hasn't been answered in a particular session.

By default, Prodigy will consider a batch that was sent out as out for annotation, and will wait for it to come back (because it can't know whether a session was discarded or not). So you can make multiple requests to the same session, and every annotator will receive a different batch. But this also means that when you refresh, you will get the next available batch and can only know whether a batch has come back or not when you restart the server and queue up all unannotated examples.

(Btw, the upcoming v1.11 wil include a refactor of the feeds logic that makes some of these parts more intuitive and includes handling of "work stealing", so you'll get the same batch again on refresh, while also allowing "abandoned" batches to be sent out again, e.g. if an annotator just stops annotating before saving their work.)

kerenya · August 8, 2021, 1:47pm

Setting "force_stream_order": true was the solution. Thanks a lot!

Topic		Replies	Views
Task lost on "page refresh" usage , solved , streams	7	938	September 2, 2021
Losing samples on browser refresh usage , done , database , streams	11	1127	October 21, 2020
"No tasks available" on page refresh usage , custom , solved	5	4376	December 27, 2018
Refresh browser fix with force_stream_order bug , usage , done , streams	48	3974	January 4, 2021
End of task hit when many task left usage , streams	5	556	March 26, 2020

Refreshing the page will jump the document offset in the stream far forward

Related topics