My team and I have been using the Prodigy tool to help us with annotating multiple user text classification tasks.
We set up a /?session=% username for each session and hosted a single instance of Prodigy. However, we found repeated annotating examples for the users. I have checked that our "_input_hash" and "_task_hash" for each example are different, as some of the forum discussion has suggested. I also set it up that the "feed overlap" and "force stream order" are "true" for our session.
I would be happy to provide any additional examples/screenshots. Would really appreciate your help to give us some directions as to where we should check next!
Hi! Are you using the latest version of Prodigy v1.11.7? We recently shipped a fix for a problem that could cause duplicate batches to be sent out to different annotators under certain configurations.
Another thing to keep an eye out for in your multi-annotator setup is whether you end up in scenario where annotators stop working before submitting their annotations and triggering the "work stealing" mechanism. If an annotator requests a batch but doesn't annotate it for a longer period of time (e.g. if they just close the app) it will eventually be sent out again to the next available annotator, to make sure you don't end up losing a batch if an annotator doesn't annotate it.
Thanks so much again for your reply! We have upgraded to v1.11.7 now.
In our case, we want to let the annotator to have some flexibility to the time it takes for them to annotate.
So I wonder, can we turn off the ""work stealing" mechanism" as you mentioned above, and what effect would it cause for the multiple user sessions? Would you have other suggestions if we want the batch sent to a user to stay, rather than rolling to the next available annotator?
We can expose the settings for the timeout and that's probably a good idea – although, you typically want to at least have some level of work stealing enabled because otherwise, an annotator opening the app, looking around and then closing it again will lead to a batch not being annotated, which is typically not what you want.
If you enable PRODIGY_LOGGING=basic, you should see a log message FEED: re-adding open tasks to stream whenever tasks are re-added for the next annotators. This should give you an idea whether this happens often in your scenario.
I understand. I have one more question, what is the current time that the stream will enable the stealing mechanism.
I am currently planning to let the annotators know that for example, they have to be done with the number of cases by x amount of time, so that we manually avoid the roll over error. So that is why I have this question.
Another thing I find is that we seems to be able to change the number of cases in a stream by the config file. Would you think any error would occur, if I set the number of cases of the batch to be say 4-5000s?
Hi, can I follow up on this thread to ask whether this timeout setting was ever added as a config option. It seems like a good idea in case as you say, users may open the prodigy url, then close the tab, etc. Thanks
Hi @adin786,
Eventually we decided not to expose this setting (it's set to 3600s) as it might potentially lead to unnecessary lags in the annotation flow to avoid what, in principle, is a good tradeoff. Instead, we've been refactoring the the feed logic to make sure there are fewer duplicates due to "work-stealing" by only triggering this mechanism once a session gets to the end of its queue. Effectively, there will be very few duplicates and only towards the end of the example stream.
This change will be available in v1.11.9 release scheduled next week.
For the v2 release, we are working on a complete redesign of feed mechanism that will eliminate the need for work-stealing tradeoff altogether.