We have just released Prodigy v1.11.9
that fixes the persistent problem of showing duplicate examples to annotators. This issue was affecting multi-annotator annotation workflows and high latency scenarios such as e.g. having Prodigy and the database deployed on a separate remote servers.
This release eliminates the duplicates produced by a tricky front-end bug. It should be be noted though, that a small number of duplicates is still expected in multi-user workflows with feed overlap
set to false
. This is perfectly normal behavior and should only occur towards the end of the example stream.
These "end-of-queue" duplicates come from the work-stealing mechanism in the internal Prodigy feed. "Work-stealing" is a preventive mechanism to avoid records in the stream from being lost when an annotator requests a batch of examples to annotate, effectively locking those examples, and then never annotates them. This mechanism allows annotators that reach the end of a shared stream to annotate these otherwise locked examples that other annotators are holding on to. Essentially we have prioritized annotating all examples in your data stream at least once vs at most once while potentially losing a few.
For the release of Prodigy v2
, we have some exciting new features and a significant redesign to the way examples are sent to different annotators. Specifically, this redesign will eliminate the need for this tradeoff and should eliminate 100% of unwanted duplicates.
It will also bring more customization to the feed_overlap
setting like setting the number of annotations you want for each task, or configuring a percentage of examples to have overlapping annotations for. We're even working on registering custom policies to distribute work to different annotators.
Apart from this important fix, v1.11.9
:
- adds additional logging to Prodigy console if duplicates appear in feed batches
- fixes a bug in
audio.transcribe
recipe resulting in some elements not being rendered properly - fixes a bug in
review
recipe resulting in multiple copies of session IDs being rendered in the UI - fixes the bug resulting in tab/window refresh causing a loss of unsaved examples
- fixes a bug with the
--wrap
functionality not working properly on\n
character
For details please see release notes