Prodigy looping over again on the same texts to annotate

ska · September 24, 2021, 3:28pm

Hi,

I've been trying to use prodigy for some days and I find it very intuitive and useful. Howaver, I'm encountering the same problem over and over again : whenever using ner.manual or ner.correct, I'm having a "loop issue", meaning that, after 25-ish texts I am labeling, I'm coming back to the first text that was first proposed to me, then the second one etc. I'm hence forced to close the interface and relaunch it, which fixes the problem.
Are there other people facing the same types of troubles ? Do you have any solution to propose to avoid this behaviour ?
Thanks a lot in advance for you help and avise,

Best,
Stéphane

ines · September 27, 2021, 10:42am

Hi! Which version of Prodigy are you using and could you upgrade to the latest if you're not yet on that? Also, could you share some more details on how you're running the recipe, if there's anything in your prodigy.json etc?

ska · October 4, 2021, 9:47am

Hi Ines,

Thanks for your answer !
The version we are using is not the latest one I think, it's the 1.11.3. I installed from wheel files. I'll try to update to the newest version.

Concerning the way I'm using my recipe : I'm on windows, and did not setup anything at first in my prodigy.json. I did after a while (to makes sure my prodigy.db points somewhere else) but the problem occured in both cases. Appart from that, I did nothing fancy, it was basically a first "hands-on" session.

lee · October 5, 2021, 6:12am

Even we are using the latest version, 1.11.4, we are also experiencing the same issue.

ska · October 5, 2021, 9:31am

Hi !
Apparently, using version 1.11.4 worked for us. Hope you can find a solution lee !

ines · October 5, 2021, 2:49pm

Glad you got it working! And it sounds like you were affected by the one problem in v1.11.3 that we fixed in v1.11.4

Could you share some more details on the command you're running and the contents of your prodigy.json?

lee · October 6, 2021, 4:49am

Nothing special in my prodigy.json, and we are currently using the spans.manual recipe.
By the way, our colleague says that the issue disappeared today as he saved the assigned annotation tasks less frequently than before. According to him, he immediately saved a task on completion. Now he saved a bunch of tasks, then the repeating issue has gone.

jdddog · October 6, 2021, 6:25am

I'm having the same issue with version 1.11.4.

I'm using the blocks interface, named multi-user sessions, feed_overlap=True in the recipe and prodigy.json is empty.

webersni · October 8, 2021, 7:09am

Hi!

I'm having this issue, too.

Version 1.11.4, textcat.manual exclusive, txt file as input

Would be great to have a fix for that

ines · October 8, 2021, 9:37am

I think we might have found a problem that could be related and will have a fix for this soon!

ines · October 14, 2021, 11:40am

Just released v1.11.5! Could you re-run your process with the new version and see if it resolves the problem?

valentinoli · December 6, 2021, 8:45am

We were using v1.11.4 and ran into the same problem, scratching our heads a lot. Until seeing this thread (and seeing the issue had been fixed in v1.11.5 in the changelog). So we upgraded Prodigy (to v1.11.6) and the problem has not come up again.

marc · December 8, 2021, 4:00am

Here on v1.11.6 and we are getting this bug. I left more details in an other thread, simply updating this one since it could help to know if I am the only one on 1.11.6 getting this.

valentinoli · December 15, 2021, 9:37am

My initial report stated we were not experiencing any more problems after updating to v1.11.6. However, I confirm now that we are still seeing duplicated tasks. Not as many as before, and none of our annotators have reported any problems like they did before. So my guess is that they don't appear in a loop anymore but only pop up subtly so the annotator doesn't even notice. So it might be that one part of the problem has been fixed, i.e. the "looping tasks" problem. I suspect the remaining part of the problem is related to this thread.

We are using the ner_manual recipe with 4 separate named multi-user sessions (4 dockers, each with 4-5 annotators).

We use "feed_overlap": true since we're building up a reference that relies on multiple annotators annotating the same examples. At some point we tried switching to "exclude_by": "input" to see if that would make any difference, but it didn't. We don't have different questions about the same input yet, so I don't think this parameter is doing anything in that case. We intend on trying out active learning later, so I suppose it will be useful then, right?

Anyway, we upgraded to v1.11.6 on 02.12.2021. Here is a summary of the number of tasks vs number of duplicates for each period after this date. The number of duplicates is the number of duplicated (_session_id, _input_hash) pairs, omitting the first occurrence.

Period: 02.12-03.12
Number of tasks: 159
Number of duplicates: 4
Unique _input_hashes in the set of duplicated tasks: 4
Unique _task_hashes in the set of duplicated tasks: 4
Unique _session_ids in the set of duplicated tasks: 2
Unique multi-user sessions in the set of duplicated tasks: 2
-------------------------
Period: 03.12-06.12
Number of tasks: 755
Number of duplicates: 15
Unique _input_hashes in the set of duplicated tasks: 15
Unique _task_hashes in the set of duplicated tasks: 15
Unique _session_ids in the set of duplicated tasks: 1
Unique multi-user sessions in the set of duplicated tasks: 1
-------------------------
Period: 06.12-08.12
Number of tasks: 1945
Number of duplicates: 100
Unique _input_hashes in the set of duplicated tasks: 97
Unique _task_hashes in the set of duplicated tasks: 97
Unique _session_ids in the set of duplicated tasks: 6
Unique multi-user sessions in the set of duplicated tasks: 4
-------------------------
Period: 08.12-09.12
Number of tasks: 230
Number of duplicates: 0
Unique _input_hashes in the set of duplicated tasks: 0
Unique _task_hashes in the set of duplicated tasks: 0
Unique _session_ids in the set of duplicated tasks: 0
Unique multi-user sessions in the set of duplicated tasks: 0
-------------------------
Period: 09.12-10.12
Number of tasks: 1083
Number of duplicates: 6
Unique _input_hashes in the set of duplicated tasks: 6
Unique _task_hashes in the set of duplicated tasks: 6
Unique _session_ids in the set of duplicated tasks: 1
Unique multi-user sessions in the set of duplicated tasks: 1
-------------------------
Period: 10.12-13.12
Number of tasks: 1291
Number of duplicates: 5
Unique _input_hashes in the set of duplicated tasks: 5
Unique _task_hashes in the set of duplicated tasks: 5
Unique _session_ids in the set of duplicated tasks: 3
Unique multi-user sessions in the set of duplicated tasks: 3
-------------------------

ryanwesslen · February 2, 2023, 7:18pm

hi @valentinoli and @marc!

Sorry for the delayed response. We're going back to close past issues.

If you're still looking into this problem, can you try to upgrade to at least Prodigy v1.11.9 (we also just released v1.11.10 that is compatible with spaCy 3.5 this week too)?

We made several fixes including a front end bug that caused duplicates in high latency multi-annotator workflows.

Please reply on that newer post if you experience any problems.

Topic		Replies	Views
ner.correct --exclude not excluding duplicate tasks bug , ner	17	1827	December 7, 2021
No tasks available in prodigy==1.11.8 when batch_size=1, instant_submit=True but there should be tasks available bug , ner , solved , multi-user	4	1005	January 24, 2023
Duplicate annotations in output Getting Started bug , to-be-released , streams	53	3512	January 27, 2023
Duplicated annotation when changing version ner , spacy	6	556	November 9, 2022
Manual Annotation - Tasks repetition usage , solved , streams	4	397	October 20, 2021

Prodigy looping over again on the same texts to annotate

Related topics