🚀 Out now: Prodigy v1.10 (plus new video!)

Prodigy v1.10 is out now :tada: And it includes a bunch of cool new features, including dependency and relation annotation, audio and video annotation, an enhanced manual image UI, more settings for NER annotation, including character-based highlighting and a cool example workflow for creating training data for fine-tuning transformer models, new recipe callbacks for validating answers at runtime and for modifying examples before the database, new settings for UI customisation, and lots more! The new version also updates Prodigy to the latest spaCy v2.3, so you can use the new models for Chinese, Japanese, Danish, Polish and Romanian. You can see the full Prodigy v1.10 changelog here: https://prodi.gy/changelog

I've also recorded a little video that walks you through the most exciting new features:

Twitter thread:

Thanks to everyone who helped beta test the new features – your feedback was super valuable :pray:


Thanks all the team for this great new update and features!

I tried to install the new version in Windows, but it seems there is a problem with spaCy dependency: it requires the previous spaCy version, instead of 2.3 From wheel metadata: Requires-Dist: spacy (<2.3.0,>=2.2.3)

At the moment, a workaround is to install Prodigy first, then forcing spaCy==2.3, ignoring the incompatibility warning.

Thank you! :smiley:

Ah, that's very strange :thinking: I just unzipped the Windows wheel locally and for me, it says the following in METADATA:

Requires-Dist: spacy (<2.4.0,>=2.3.0)

Did you install the wheel on top of a previous installation? Maybe pip messed up and you ended up with stale metadata from the previous version?

Installed in a clean venv, obtaining the issue.
Then, I did the same test: unzipped the Windows wheel and verified internal metadata.

Will try to download the wheel again, maybe I got a wrong version.
Thank you

Will try to download the wheel again, maybe I got a wrong version.
Thank you

Fault was on my side... I used the v1.9.10 wheel, instead of the 1.10.0. With the latest (and correct one), it's OK. Thanks for the support.

Hey Ines,

Thanks for all your great work on this release. I've been waiting for the before_db callback and I'm delighted it's been implemented. I've spotted a little typo in the code example for it here. It should be .startswith(...):

if eg["image"].startwith("data:") and "path" in eg:
#                  ^ missing an 's' here

There's one other thing I noticed in the textcat.teach docs that wasn't so sure about, it might be new or I just never noticed. There's a --long-text flag in the example, but no description of it and it doesn't exist in the prodigy-recipes repo.


Thanks, appreciate the attention to detail :pray: Fixed and should be live in a second.

Ah, sorry for the confusion, I think it ended up this way because we weren't sure whether we'd want to deprecate that feature or not. It's always been a bit experimental and only available in the binary workflow.

The idea is that if you have very long texts, you're often still training your model on shorter fragments like sentences and then average over the predictions to get the score for a whole document. So there's often no real benefit in annotating the whole long documents at once, it just takes much longer and you only get one label per document. So the idea of the --long-text mode is to show you a highlighted sentence at the time and collect feedback on that, focusing on the most uncertain scores. If you're training with Prodigy and setting the --binary flag, you should be able to update the model from those annotations. However, I'm not sure it's actually better than a more transparent approach where you split up the sentences beforehand.

1 Like

Thanks a million Ines!

1 Like