I've been trying to use prodigy for some days and I find it very intuitive and useful. Howaver, I'm encountering the same problem over and over again : whenever using ner.manual or ner.correct, I'm having a "loop issue", meaning that, after 25-ish texts I am labeling, I'm coming back to the first text that was first proposed to me, then the second one etc. I'm hence forced to close the interface and relaunch it, which fixes the problem.
Are there other people facing the same types of troubles ? Do you have any solution to propose to avoid this behaviour ?
Thanks a lot in advance for you help and avise,
Hi! Which version of Prodigy are you using and could you upgrade to the latest if you're not yet on that? Also, could you share some more details on how you're running the recipe, if there's anything in your prodigy.json etc?
Thanks for your answer !
The version we are using is not the latest one I think, it's the 1.11.3. I installed from wheel files. I'll try to update to the newest version.
Concerning the way I'm using my recipe : I'm on windows, and did not setup anything at first in my prodigy.json. I did after a while (to makes sure my prodigy.db points somewhere else) but the problem occured in both cases. Appart from that, I did nothing fancy, it was basically a first "hands-on" session.
Nothing special in my prodigy.json, and we are currently using the spans.manual recipe.
By the way, our colleague says that the issue disappeared today as he saved the assigned annotation tasks less frequently than before. According to him, he immediately saved a task on completion. Now he saved a bunch of tasks, then the repeating issue has gone.
We were using v1.11.4 and ran into the same problem, scratching our heads a lot. Until seeing this thread (and seeing the issue had been fixed in v1.11.5 in the changelog). So we upgraded Prodigy (to v1.11.6) and the problem has not come up again.
Here on v1.11.6 and we are getting this bug. I left more details in an other thread, simply updating this one since it could help to know if I am the only one on 1.11.6 getting this.
My initial report stated we were not experiencing any more problems after updating to v1.11.6. However, I confirm now that we are still seeing duplicated tasks. Not as many as before, and none of our annotators have reported any problems like they did before. So my guess is that they don't appear in a loop anymore but only pop up subtly so the annotator doesn't even notice. So it might be that one part of the problem has been fixed, i.e. the "looping tasks" problem. I suspect the remaining part of the problem is related to this thread.
We are using the ner_manual recipe with 4 separate named multi-user sessions (4 dockers, each with 4-5 annotators).
We use "feed_overlap": true since we're building up a reference that relies on multiple annotators annotating the same examples. At some point we tried switching to "exclude_by": "input" to see if that would make any difference, but it didn't. We don't have different questions about the same input yet, so I don't think this parameter is doing anything in that case. We intend on trying out active learning later, so I suppose it will be useful then, right?
Anyway, we upgraded to v1.11.6 on 02.12.2021. Here is a summary of the number of tasks vs number of duplicates for each period after this date. The number of duplicates is the number of duplicated (_session_id, _input_hash) pairs, omitting the first occurrence.
Period: 02.12-03.12
Number of tasks: 159
Number of duplicates: 4
Unique _input_hashes in the set of duplicated tasks: 4
Unique _task_hashes in the set of duplicated tasks: 4
Unique _session_ids in the set of duplicated tasks: 2
Unique multi-user sessions in the set of duplicated tasks: 2
-------------------------
Period: 03.12-06.12
Number of tasks: 755
Number of duplicates: 15
Unique _input_hashes in the set of duplicated tasks: 15
Unique _task_hashes in the set of duplicated tasks: 15
Unique _session_ids in the set of duplicated tasks: 1
Unique multi-user sessions in the set of duplicated tasks: 1
-------------------------
Period: 06.12-08.12
Number of tasks: 1945
Number of duplicates: 100
Unique _input_hashes in the set of duplicated tasks: 97
Unique _task_hashes in the set of duplicated tasks: 97
Unique _session_ids in the set of duplicated tasks: 6
Unique multi-user sessions in the set of duplicated tasks: 4
-------------------------
Period: 08.12-09.12
Number of tasks: 230
Number of duplicates: 0
Unique _input_hashes in the set of duplicated tasks: 0
Unique _task_hashes in the set of duplicated tasks: 0
Unique _session_ids in the set of duplicated tasks: 0
Unique multi-user sessions in the set of duplicated tasks: 0
-------------------------
Period: 09.12-10.12
Number of tasks: 1083
Number of duplicates: 6
Unique _input_hashes in the set of duplicated tasks: 6
Unique _task_hashes in the set of duplicated tasks: 6
Unique _session_ids in the set of duplicated tasks: 1
Unique multi-user sessions in the set of duplicated tasks: 1
-------------------------
Period: 10.12-13.12
Number of tasks: 1291
Number of duplicates: 5
Unique _input_hashes in the set of duplicated tasks: 5
Unique _task_hashes in the set of duplicated tasks: 5
Unique _session_ids in the set of duplicated tasks: 3
Unique multi-user sessions in the set of duplicated tasks: 3
-------------------------
Sorry for the delayed response. We're going back to close past issues.
If you're still looking into this problem, can you try to upgrade to at least Prodigy v1.11.9 (we also just released v1.11.10 that is compatible with spaCy 3.5 this week too)?
We made several fixes including a front end bug that caused duplicates in high latency multi-annotator workflows.
Please reply on that newer post if you experience any problems.