Support for spaCy v2.1

nix411 · March 19, 2019, 1:34pm

When do you expect support for spaCy v2.1?

honnibal · March 20, 2019, 8:42pm

Probably early next week. We’ve updated the code for it, and are doing some manual testing to make sure we don’t need to tweak any of the active learning heuristics for the new models, in case things changed.

We also wanted to make sure things were fully stable. With Prodigy it’s less convenient for users to install updates than it is for spaCy and other open-source libraries, so we’re a bit more cautious. v2.1 has been pretty well tested because it was on nightly for so long, but we still want to make sure any problems surface before we ask everyone to download a new Prodigy update and retrain all their models.

htebmal · March 28, 2019, 4:17pm

@honnibal May I just check with this - we’re currently using spacy pretrain on the prebuilt spacy models to prepare for compatibility, and using prodigy’s ner.match recipe to build a training dataset. Should we expect these/any other artefacts to break with the new update, or will we only need to retrain the models themselves?

ines · March 28, 2019, 4:51pm

It's really only the models

In theory there is a possibility that the tokenization can differ for very specific edge cases. But it's extremely unlikely that this would affect any of the entity spans you've annotated – for this to happen, the character offsets of the entities would have to not map to valid token boundaries anymore. But this is also something you can verify pretty easily yourself: for every span you've annotated in a document, Doc.char_span needs to succeed.

htebmal · March 28, 2019, 4:53pm

Excellent! That makes sense; thanks for the clarification. Looking forward to hearing about progress!

BLP · March 30, 2019, 12:49pm

Are there any updates on this? I just updated to spacy 2.1 and get the following error for “ner.teach”:

“ImportError: cannot import name _cleanup”

Is this due to spacy 2.1?

honnibal · March 30, 2019, 1:30pm

@BLP Yes, that’s due to spaCy v2.1.

We have a build of Prodigy that works with v2.1, but there’s one or two features we’d still like to add, especially pretraining support in the recipes. We also want to keep testing, as we want to make sure we give everyone a smooth experience.

Actually it would be useful to have some external testers as well. If you want to try it out, send us an email? contact@explosion.ai

nix411 · April 1, 2019, 7:53am

I sent you an email the other day regarding testing from nicolai@plx.ai. I’ll be happy to start testing the new version. I am streaming data from Google Firestore and will probably use that for saving the annotations as well. I’ll be using prodigy for textcat today and for a parser for custom semantics soon (somehow).

ronaldoviber · April 17, 2019, 1:06pm

We have a build of Prodigy that works with v2.1

is it possible to get the working version?

I just start with prodigy and since one year i work with spacy (successfull!). I get issues with my scripts while downgrading spacy and in this current environment it makes no sense to integrate prodigy to do this work again when a new version releases.

I'm sitting between the chairs...

ronaldoviber · April 23, 2019, 11:36am

What’s up with testing the spacy-2.1 working Prodigy?
I allready wrote to contact but didn’t got an answer…

I’ve tried to install against 2.1.3 but it crashes with thinc and so i stopped “researching”.

honnibal · April 23, 2019, 12:12pm

Hi Ronaldo,

We didn’t receive your email about this. Apologies for that — if it’s still relevant, perhaps you could resend?

We’ve been testing the new version carefully as once it’s released people will need to retrain their models to upgrade, which is inconvenient. You can always export your Prodigy annotations and use them to train a spaCy v2.1 model, so it shouldn’t affect your total workflow to be using the current version of Prodigy. You shouldn’t have Prodigy in your production runtime, usually: there’s no reason to be running the annotation from the same environment you’re using to run your models in production.

We should be pushing a new patch release of spaCy today that fixes a couple of bugs. Once that’s out, I think the current build we have of Prodigy could be considered a release candidate. If there are no further problems found we’ll go ahead and make the v1.8 release. However, please do be patient.

ronaldoviber · April 24, 2019, 10:02am

That sounds good.
I develop a system and currently it’s not productive. The NLP part is integrated and the annotations are used on a higher level evaluation process. In Spacy we use for example EntityRuler and are set to 2.1 otherwise. If we go back to the 2.0 there are problems with the current scripts. So the use of prodigy 1.7 is not productive to go there for debugging when an upgrade takes place in the near future.
So I don’t care about problems with 2.0 models. I have to integrate the frontend functionally and RC should be sufficient.
How do I get the RC / V1.8? My licence was bougt by my Company.
grtz

trevorwelch · April 25, 2019, 1:11am

+1 i would love to join the beta, for my current scenario of trying to use ner.match EDIT: also ner.teach :

Without spaCy v2.1, I’m getting: ValueError: [T002] Pattern length (10) >= phrase_matcher.max_length (10). Length can be set on initialization, up to 10.

With spaCy v2.1, I’m getting: ImportError: cannot import name _cleanup

ines · April 25, 2019, 10:40am

Just out of curiosity, what types of patterns do you have that are 10+ tokens long? While there can always be edge cases (like, trying to match a span with lots of punctuation etc. that ends up lots of tokens), you usually don't want to be matching sequences this long when annotating in Prodigy. Keep in mind that phrase patterns (e.g. "pattern": "some string") really only return exact string matches. So unless your data really contains a lot of mentions of those exact strings, the pattern likely won't be that useful, either.

trevorwelch · April 25, 2019, 1:38pm

Yes, it is the case that I’m looking for exact pattern matches. In my case, the entities I’m looking for have a lot of overlap with generic, non-entity terms (imagine an entity Red Bumblebee which is the same as a generic term red bumblebee), and the text isn’t very long so there is not much context for the model to derive to determine if entity or generic – this is my elementary understanding of what’s possible though, so I will try your suggestion, at least that way I can get started and see how the results are looking

ines · April 25, 2019, 3:21pm

@trevorwelch Yes, this makes sense. So it looks like you might just have a few outliers in your patterns then? The pattern length limit only occurs if the string you want to match consists of 10 or more tokens in total. If those examples are important for your use case, you can always add annotations for them later – but working with shorter patterns only will probably still let you cover the most frequent entities.

fros1y · May 7, 2019, 5:47pm

Any updates here? Any beta that could be made available?

Thanks,

olivierwa · May 14, 2019, 7:07am

Hi,
I have just purchased Prodi.gy but unforunately it forces me to go back to Spacy 2.0.18 when I have been working with 2.1.3 for several weeks.
I was expecting Prodi.gy to be on the same version than spacy.
What should I do?
Thanks,
Olivier

Topic		Replies	Views
update to Prodigy 1.8 and spaCy 2.1 meta , solved	11	3231	September 12, 2019
✨ Prodigy nightly: spaCy v3 support, UI for overlapping spans & more meta , done , spacy , news , nightly	113	12688	January 20, 2022
Timeline for SpaCy 3 integration spacy , news	9	3109	February 8, 2021
Tokenization compatibility issues in rel.manual enhancement , usage , done , transformers , relations	7	1427	September 8, 2020
Compatibility of versions usage , spacy	6	1358	October 1, 2018

Support for spaCy v2.1

Related topics