Repeated Examples in Multi-user Sessions

rz789 · January 25, 2022, 11:43pm

Dear Ines and the Prodigy Team,

My team and I have been using the Prodigy tool to help us with annotating multiple user text classification tasks.

We set up a /?session=% username for each session and hosted a single instance of Prodigy. However, we found repeated annotating examples for the users. I have checked that our "_input_hash" and "_task_hash" for each example are different, as some of the forum discussion has suggested. I also set it up that the "feed overlap" and "force stream order" are "true" for our session.

I would be happy to provide any additional examples/screenshots. Would really appreciate your help to give us some directions as to where we should check next!

Thanks so much!

ines · January 26, 2022, 3:50pm

Hi! Are you using the latest version of Prodigy v1.11.7? We recently shipped a fix for a problem that could cause duplicate batches to be sent out to different annotators under certain configurations.

Another thing to keep an eye out for in your multi-annotator setup is whether you end up in scenario where annotators stop working before submitting their annotations and triggering the "work stealing" mechanism. If an annotator requests a batch but doesn't annotate it for a longer period of time (e.g. if they just close the app) it will eventually be sent out again to the next available annotator, to make sure you don't end up losing a batch if an annotator doesn't annotate it.

rz789 · January 27, 2022, 11:03pm

Dear Ines,

Thanks so much again for your reply! We have upgraded to v1.11.7 now.

In our case, we want to let the annotator to have some flexibility to the time it takes for them to annotate.

So I wonder, can we turn off the ""work stealing" mechanism" as you mentioned above, and what effect would it cause for the multiple user sessions? Would you have other suggestions if we want the batch sent to a user to stay, rather than rolling to the next available annotator?

Thanks so much again!

ines · January 28, 2022, 10:21am

We can expose the settings for the timeout and that's probably a good idea – although, you typically want to at least have some level of work stealing enabled because otherwise, an annotator opening the app, looking around and then closing it again will lead to a batch not being annotated, which is typically not what you want.

If you enable PRODIGY_LOGGING=basic, you should see a log message FEED: re-adding open tasks to stream whenever tasks are re-added for the next annotators. This should give you an idea whether this happens often in your scenario.

rz789 · January 29, 2022, 7:36pm

Dear Ines,

Thanks so much for your reply again!

I understand. I have one more question, what is the current time that the stream will enable the stealing mechanism.

I am currently planning to let the annotators know that for example, they have to be done with the number of cases by x amount of time, so that we manually avoid the roll over error. So that is why I have this question.

Another thing I find is that we seems to be able to change the number of cases in a stream by the config file. Would you think any error would occur, if I set the number of cases of the batch to be say 4-5000s?

Many thanks again!

adin786 · January 20, 2023, 11:01am

Hi, can I follow up on this thread to ask whether this timeout setting was ever added as a config option. It seems like a good idea in case as you say, users may open the prodigy url, then close the tab, etc. Thanks

magdaaniol · January 20, 2023, 4:51pm

Hi @adin786,
Eventually we decided not to expose this setting (it's set to 3600s) as it might potentially lead to unnecessary lags in the annotation flow to avoid what, in principle, is a good tradeoff. Instead, we've been refactoring the the feed logic to make sure there are fewer duplicates due to "work-stealing" by only triggering this mechanism once a session gets to the end of its queue. Effectively, there will be very few duplicates and only towards the end of the example stream.
This change will be available in v1.11.9 release scheduled next week.

For the v2 release, we are working on a complete redesign of feed mechanism that will eliminate the need for work-stealing tradeoff altogether.

ryanwesslen · January 24, 2023, 7:29pm

hi @adin786!

As Madga mentioned, we just released v1.11.9 which refactors the logic. You should have received an email if your Prodigy license is still active.

If you have questions, please respond back to that post as we're keeping an eye out for v1.11.9 related issues.

Topic		Replies	Views
multiple users session with feed_overlap set to false	2	141	March 6, 2024
Same examples being shown to different annotators usage , to-be-released , streams	8	567	November 19, 2021
Allowing for a constant stream of examples in a multi-annotator setting usage , streams , multi-user	3	288	April 17, 2024
Tasks are duplicated	3	441	June 7, 2023
Training with multiple annotators usage , solved	8	3822	July 18, 2023

Repeated Examples in Multi-user Sessions

Related topics