Timeline for SpaCy 3 integration

I'm eager to integrate SpaCy 3 into our recipes. Any estimate of when we might be support (even beta support) in for SpaCy 3 in Prodigy?

1 Like

Ha, I was actually going to open a thread about this today, but you beat me to it :smile: For context and completeness, here's an overview of what's coming in spaCy v3:

What already works with Prodigy

You can already train spaCy v3 models using annotations collected in Prodigy by exporting them with data-to-spacy and then running spacy convert to convert the corpus to the new and compact binary .spacy.

What's coming for Prodigy nightly

There are only a handful of internals (like imports, calls to add_pipe) that have to be adjusted to make Prodigy run with spaCy v3. I've already been working on this in the background and we'll probably have a beta program for users who are interested in testing it (no ETA yet, though, depends on how things go). The trickier parts are the active learning annotation models and updating from binary annotations in the loop so that might take a bit more work and we might not have that ready for the nightly.

Cool stuff that will be possible in the future

  • Easily use transformer models in the loop during annotation (also to semi-automatically create a dataset that you can then use to train a more lightweight downstream model).
  • Integrate Prodigy into end-to-end workflows using spaCy projects with tracked changes – for example, you could have a step spacy project run annotate that updates your corpus, and then re-run spacy project run train if the data has changed, package your model, deploy it, visualize it, whatever.
  • Prodigy can expose custom data readers that load and convert annotations from a dataset or an exported JSONL file and you can use them in your config.cfg. Ideally, I'd love to deprecate Prodigy's train wrapper and just make it very easy to use spacy train with Prodigy instead.
  • Support for dependency matcher patterns.
  • Possibly a bunch of other things I haven't thought of yet :smiley:
7 Likes

First of all. Congratulations on the new library - you guys rock!

Are you saying no ETA on the beta program or the final release? I'll happily sign up as beta tester if you need any.

1 Like

Thanks! :smiley:

I meant for Prodigy nightly (but obviously the final release as well :sweat_smile:) because there are still a few things to do and it's hard to predict how long that takes. But that's good to know, thanks! I'll keep updating this thread.

1 Like

We'd be interested in trying out any nightly prodigy builds with SpaCy 3.0 support. We have a custom recipe that doesn't do training in the loop but, periodically batch trains and swaps out the model. Potentially that could alleviate some of the concerns with training transformer models in the loop.

2 Likes

Hi. Wondering if there is any update here, given that spaCy 3.0 has now been released?

Now that spaCy v3.0 stable is out, we can build Prodigy against it and get the nightly ready. Keep an eye on this thread for the announcement of the Prodigy nightly program :slightly_smiling_face:

3 Likes

Update :tada:

1 Like