We've been struggling for some weeks with duplicates being shown to annotators in multi-user session when carrying out image annotation. It has led to some significant amounts of re-work / annotation due to the same examples being presented repeatedly.
Prodigy version: 1.11.8a4 (installed to try and resolve dupe issues we saw from another thread)
Our config includes:
We've now noticed in the log, messages similar to the following:
FEED: re-adding 9 expired tasks to session
What we think is happening is that complex image annotation is taking some time and with a batch size of 10, the server is expiring a batch and then re-adding it to another annotator.
Obviously we can reduce the batch size but the problem is that we'd ideally like to keep the history at 10 so people can go back if needed (understand it's the min of batch size and history size).
Are we able to control the expiry time that the server waits for before it expires and re-allocates batches?
Apologies for the delay in getting back to you.
To answer your question: no, we do not expose the session timeout setting (it's set to 3600s) as it might potentially lead to unnecessary lags in the annotation flow to avoid what, in principle, is protection against losing annotation examples altogether. Instead, we've been refactoring the the feed logic to make sure there are fewer duplicates due to timeout by only triggering this mechanism once a session gets to the end of its queue. Effectively, there will be much fewer duplicates and only towards the end of the example stream.
This change is available in the latest
Furthermore, for the
v2 release, we are working on a complete redesign of feed mechanism that will eliminate duplicates due to timeout altogether.
Important thing to note is
v1.11.9 is a patch release on
1.11.8 so it uses the the same DB setup as
1.11.8. I'd like to stress that
1.11.8a4 was an experimental release.
v1.11.9 release post for more details on how we are handling timed-out examples.