Missing data when


When using sessions for a multi-user project, I notice that if an user leaves his session and then comes back (like when he closes the page and re-opens it in a few minutes), Prodigy kind of "skips" a few examples. As I understand it, Prodigy works by batches of data. So when I leave the session, Prodigy drops the current batch and when later I reconnect, it fetches a new batch, whether the old one is finished or not.

Then I tried fixing the batch-size parameter to 1, but it didn't work. I had a total of 10 examples. I left the session once and came back to finish, and I ended up with 9 annotations. To get this missing example I had to restart the workflow.

Do you have an idea how to fix this ? So that people can "log out" whenever they want and when they're back on they can pick up where they left off.

Thanks in advance :smile:

Hi! You can use the "force_stream_order": true setting in your prodigy.json to make the stream preserve the order of batches and examples and re-sent the current batch until it has been annotated. So if a user closes the browser and then reopen the app later in the same session, they'll start again with the most recent unannotated example. (Otherwise, that example would be queued up again, but only after you restart the server.)

The only scenario where this wouldn't work is if you're using an active learning-powered recipe with a "dynamic" stream, or if you have multiple people accessing the same session (because then, they'd all receive the same batch and you end up with duplicates).

If you want to send examples back immediately as they're annotated, you can set "instant_submit: true" in your prodigy.json. This can be helpful if you want your stream to be more responsive to the latest answer (e.g. to decide which follow-up examples to send out next). But it also means that there's no option to undo, because the answer is sent back immediately.

Great it worked ! Thank you !