Prodigy v1.11.9 release

We have just released Prodigy v1.11.9 that fixes the persistent problem of showing duplicate examples to annotators. This issue was affecting multi-annotator annotation workflows and high latency scenarios such as e.g. having Prodigy and the database deployed on a separate remote servers.

This release eliminates the duplicates produced by a tricky front-end bug. It should be be noted though, that a small number of duplicates is still expected in multi-user workflows with feed overlap set to false. This is perfectly normal behavior and should only occur towards the end of the example stream.

These "end-of-queue" duplicates come from the work-stealing mechanism in the internal Prodigy feed. "Work-stealing" is a preventive mechanism to avoid records in the stream from being lost when an annotator requests a batch of examples to annotate, effectively locking those examples, and then never annotates them. This mechanism allows annotators that reach the end of a shared stream to annotate these otherwise locked examples that other annotators are holding on to. Essentially we have prioritized annotating all examples in your data stream at least once vs at most once while potentially losing a few.

For the release of Prodigy v2, we have some exciting new features and a significant redesign to the way examples are sent to different annotators. Specifically, this redesign will eliminate the need for this tradeoff and should eliminate 100% of unwanted duplicates.

It will also bring more customization to the feed_overlap setting like setting the number of annotations you want for each task, or configuring a percentage of examples to have overlapping annotations for. We're even working on registering custom policies to distribute work to different annotators.

Apart from this important fix, v1.11.9:

  • adds additional logging to Prodigy console if duplicates appear in feed batches
  • fixes a bug in audio.transcribe recipe resulting in some elements not being rendered properly
  • fixes a bug in review recipe resulting in multiple copies of session IDs being rendered in the UI
  • fixes the bug resulting in tab/window refresh causing a loss of unsaved examples
  • fixes a bug with the --wrap functionality not working properly on \n character

For details please see release notes

8 Likes

Thanks for this! Now that it's compatible with spacy 3.5 I'm looking forward to giving v1.11.10 a go. I usually install via a pip area on my conda yaml env file (using https://${PRODIGY_LICENSE}@download.prodi.gy/index) but I'm getting an error "No matching distribution found for prodigy". As I spelunk for incompatible requirements, could you confirm that prodigy is available via that additional index?

Thanks!

Thanks @adamkgoldfarb!

What version of Python are you running?

Can you run python -m prodigy stats? :man_facepalming:

Sorry, of course. I don't think we have wheels for 3.11. I can confirm tomorrow. Can you check on 3.10 in interim? This is good feedback and we'll likely see about 3.11 wheels.

I'm on python 3.11. I can't run python -m prodigy stats because the environment install fails-- which makes me think it's a version conflict issue. I don't want to make you chase down a conflicting requirements issue-- I was just making sure that prodigy was indeed available at that pip index and was not, I dunno, awaiting a build process or something!

Ah! Makes total sense. 3.10 works fine! I just got excited about speed-ups in 3.11... But 3.10 is fine for now and I'll install 3.11 when the time comes!

It's worth noting that since this original post, we've also released Prodigy v1.11.10 which simply updates for spaCy 3.5 and other dependency updates (fastapi and pydantic).