Support for spaCy v2.1

(Nicolai Bjerre Pedersen) #1

When do you expect support for spaCy v2.1?


Match patterns without creating huge files
Match patterns without creating huge files
(Matthew Honnibal) #2

Probably early next week. We’ve updated the code for it, and are doing some manual testing to make sure we don’t need to tweak any of the active learning heuristics for the new models, in case things changed.

We also wanted to make sure things were fully stable. With Prodigy it’s less convenient for users to install updates than it is for spaCy and other open-source libraries, so we’re a bit more cautious. v2.1 has been pretty well tested because it was on nightly for so long, but we still want to make sure any problems surface before we ask everyone to download a new Prodigy update and retrain all their models.


Match patterns without creating huge files

@honnibal May I just check with this - we’re currently using spacy pretrain on the prebuilt spacy models to prepare for compatibility, and using prodigy’s ner.match recipe to build a training dataset. Should we expect these/any other artefacts to break with the new update, or will we only need to retrain the models themselves?


(Ines Montani) #4

It’s really only the models :slightly_smiling_face:

In theory there is a possibility that the tokenization can differ for very specific edge cases. But it’s extremely unlikely that this would affect any of the entity spans you’ve annotated – for this to happen, the character offsets of the entities would have to not map to valid token boundaries anymore. But this is also something you can verify pretty easily yourself: for every span you’ve annotated in a document, Doc.char_span needs to succeed.



Excellent! That makes sense; thanks for the clarification. Looking forward to hearing about progress!

1 Like


Are there any updates on this? I just updated to spacy 2.1 and get the following error for “ner.teach”:

“ImportError: cannot import name _cleanup”

Is this due to spacy 2.1?


(Matthew Honnibal) #7

@BLP Yes, that’s due to spaCy v2.1.

We have a build of Prodigy that works with v2.1, but there’s one or two features we’d still like to add, especially pretraining support in the recipes. We also want to keep testing, as we want to make sure we give everyone a smooth experience.

Actually it would be useful to have some external testers as well. If you want to try it out, send us an email?


error loading prodigy (textcat.batch-train)ed model using spacy 2.1
(Nicolai Bjerre Pedersen) #8

I sent you an email the other day regarding testing from I’ll be happy to start testing the new version. I am streaming data from Google Firestore and will probably use that for saving the annotations as well. I’ll be using prodigy for textcat today and for a parser for custom semantics soon (somehow).


(Ines Montani) pinned globally #9

(Ronaldo V ) #10

We have a build of Prodigy that works with v2.1

is it possible to get the working version?

I just start with prodigy and since one year i work with spacy (successfull!). I get issues with my scripts while downgrading spacy and in this current environment it makes no sense to integrate prodigy to do this work again when a new version releases.

I’m sitting between the chairs…