Same examples being shown to different annotators


I am having an issue with using multi user sessions. I am getting the same examples being given to two different users in the same order. Even after one user annotates, moves to the next one, and saves, the other user is still seeing the same example upon reloading the page (but I do see the the total annotations increase for both users after saving).

This is the command I run to start the annotations on dummy data...
python -m prodigy textcat.manual dummy_dataset .\Desktop\dummy_text.txt --label A,B

I set the allowed users env variable (PRODIGY_ALLOWED_SESSIONS=alex,bob) and this is how I open the two users on seperate tabs...

This is my prodigy.json. I have had tried exclude_by set to "task" and force_stream_order to false with the same results...
"theme": "basic",
"feed_overlap": false,
"exclude_by": "input",

What might be wrong here? I think I have looked through other posts on this issue and tried all those suggestions with no success.

Also, two specific questions about multi user sessions-

  1. When feed_overlap is false, does user1 have to save their annotation for user2 to not see it or should they just never be shown the same examples?
  2. If they do have to save it or if multiple tabs are open under the same user, is there a way to shuffle on the front end so the chance of overlap is less and can at least be corrected more easily if it does happen?

Please let me know. Thank you!

Hi! How many examples are in your dummy_text.txt file? If it's really short, there's a mechanism in Prodigy where unannotated examples in 1 user's session will be queued into another user's session if there are no more new examples available. The goal being to ensure 1 user can't queue up examples, never annotate them, and have those examples be totally lost.

About 10 examples. I have tried with 5-15 examples with the same results. As I step through the examples in each tab, it looks like each user gets every example. How many should I have so this doesn't happen?

batch_size * 2 should do it (of unique examples). So 20 if you're using the default batch_size of 10. You can also set a smaller batch_size in your prodigy.json

I put in 22 examples with batch_size of 10 on a new dataset and still get the same issue. It's shuffled now, but when user1 annotates and saves, user2 still eventually comes to the example that was already annotated by user1. Even after reloading the page for user2.

The weird thing is that when user2 reloads, I can see total annotations go up on user2's end, but user2 still ends up seeing them.

To confirm, what version of Prodigy are you using? I'll see if I can reproduce.

Version 1.11.5

Alright I see the issue you're running into and it's something we've identified before. It's important to note this situation will only really come up in the last (batch_size * n_annotators) just since you have a small number of examples, the first 20 examples is also the last 20.

Prodigy is designed this way under the principle that we'd rather you get an example annotated twice than not at all. That being said, I'll add a small improvement for the next release that should reduce the number of duplicates you would see (especially in a test scenario like this one).

I see. Thank you! This will work fine for our actual dataset.

Also, I noticed something else which won't really be a problem for us, but wanted to bring it up: I set batch size to 5 and created a dummy txt of numbers 1-42 and open two named sessions alternating back and forth between them labeling, but not saving any examples. I noticed the last 2 batches do overlap as you say, but by the time it says no more examples it skips one of the batches. I tried this twice and noticed it. Strangely, total annotations shows as 42 at the end even though I know one batch of 5 was skipped.

Perhaps an issue on my end and again, not really a problem for us, but just wanted to let you know.

Thank you again for your help.