Missed examples on prodigy interface

Whenever i launch prodigy for multiple sessions and go to the session link (example: localhost:8080/?session=sample_session), and close the tab without doing anything, the next time i reload that link, some of the examples are missing.
Is there way so that this does not happen, or atleast keep a track of the documents missed?
Thank you!

Hi @daniyalSelani,

By default prodigy sends examples to the browser and then forgets about them. This means if you sit and refresh the page over and over you'll get a new set of examples each time until you exhaust the dataset. With really large datasets where you don't annotate every example this is usually not a problem, or desirable.

A while ago we added a stream that repeats the same questions until they're answered, and you can enable that by setting force_stream_order to True in your prodigy.json file. When you do that, closing the tab will not cause new examples to be shown. When using this mode, it's possible that the same example will be shown to two users (if they are annotating at the same time.) but any duplicate answers will be filtered out on the server when they're received.

If the repeating feed is not to your liking, you can leave the option turned off and rest assured that your examples are not forgotten about when they're skipped after closing a tab. Because the server never receives answers for them, they will be shown to you again after you stop and start the prodigy server.

Let me know if this does't resolve your issue.

-Justin

Thank you so much!

Hi,

I am experiencing an issue when the session completes with No tasks available but when I export the dataset with db-out, many examples were missed.

I set my prodigy.json with force_stream_order: false, feed_overlap: false.

I am using the ner.correct recipe with a nightly 1.11.0a8 version. My dataset is small.

I observed that this issue happens whether I used a multi-user session or not (main session).

I read your answer above but I am confused with the last paragraph since it seems contradictory with the first paragraph.

If force_stream_order is set to false and I keep refreshing the browser without saving any tasks, will it exhaust my small dataset and then shows No tasks available?

After I see No tasks available, even if I restart the server, I don't see the remaining and missing examples. It still shows No tasks available.