I’ve trained an NER model and saved it as an importable python module using the previous version of prodigy (0.3.0) and the steps laid out in your Getting Started documentation. After installing the newest version (0.4.0) of prodigy I’ve also reinstalled the saved module, but the results that I get out when running my NER script with the trained module are now quite different from when I ran it with the previous version, as in: my script finds some completely different entities than it did when I ran the script from the virtualenv that had the previous version of prodigy installed…!?! I should emphasise that I have not edited the script in any way - all I’ve done is to create a new virtualenv, installed prodigy and en_core_web_sm there and now I’m trying to run a python script that calls spacy - not prodigy. I would expect the results to be identical but they’re completely different. I remember reading somewhere that one should retrain one’s annotations after each update, but I thought it would be possible to circumvent this by packaging one’s model into an importable python module, since this is a spacy object that has nothing to do with prodigy per se. Is this not the case? What am I missing here?
Prodigy v0.4.0
uses the latest spaCy alpha version, v2.0.0a17
– whereas the previous version was using spaCy v2.0.0a16
, which had a slightly different model architecture. The model you've trained with Prodigy is a spaCy model – and it was trained using the inputs and configuration of the spaCy version you were running at the time. So you'll need to re-train your model in the latest version of Prodigy, which uses the latest version of spaCy – and then run it with the latest version of spaCy.
The main reason you need to retrain your models after an update of spaCy that affects the model architecture is that your train and runtime inputs must match. If the architecture you used to train the model is different from the architecture you're using to retrieve the model's predictions, the results will be very different – and usually much worse.
The good news is that this problem will hopefully occur less frequently, as we release the stable version of spaCy v2.0. Especially in the past few weeks, we've been making lots of improvements to spaCy's parser and entity recognizer to finish up the release candidate – and the models are now much better and faster. But many of these updates also meant we had to train and publish new models – and our alpha users had to re-train their models.
OK, thanks for clarifying that (and thanks, as always, for the extremely rapid reply ). Any idea when the stable version of spaCy 2.0 will be released?