Port from old to new version

Hi,

I have previously trained my NER model using the old version of prodigy (1.8.4), and understand that now with the new version, some recipes have changed.

@ines, @honnibal, mentioned that models have to be re-trained again. Is that the only way? Can I export the binary annotations from the prodigy database, and import it to the new database for training in the latest version of Prodigy?

Also, I have previously used ner.teach in old version of prodigy, and batch-train. However, understand that the new version has the --binary flag. Would this cause a big difference in my model, or do you recommend i port it over to the version of prodigy to train with the flag set as true?

Thanks.!

There shouldn't really be any breaking changes, though, and the deprecated recipes all still exist and can be used. We've just introduced a bunch of new features and workflows (also see the changelog for details).

Yes, if the model implementations change, you'll have to retrain your model with that the new version. We limiting those changes to minor and major versions of spaCy only. The only reason we're changing the implementations is to improve them (and that sometimes means backwards-incompatible changes).

You typically won't have to import or export anything – the Prodigy database is in your user home directory by default, and any version of Prodigy can access the same database. So you could even have multiple versions of Prodigy in different virtual environments that all access the same data.

Setting the --binary flag on the new train recipe is pretty much equivalent to running ner.batch-train with binary annotations. So there shouldn't be any fundamental difference. However, changes to the training process, model implementations in spaCy and bug fixes can obviously have an impact on your results.

Hi @ines,

Then may I ask the purpose of the --binary flag? What is the difference in the training process at the backend?

Also, if I mix binary annotations together with ner.correct recipe in the same Prodigy database, would it be an issue in training?

Thanks once again!

Yes, see my comment here for details on what makes training from yes/no annotations special and how it works:

Yes, you shouldn't mix those in the same dataset because you want to update differently depending on the type of annotation. For the binary annotations, you want to set the --binary flag to take advantage of the yes and no decisions and to treat all other tokens as missing values. If you've collected data with ner.correct and the annotations are complete (all entities in the text are labelled), you want to take advantage of that and let the model treat all unlabelled tokens as outside of an entity and not missing values. This gives you better results.

Thanks alot for your reply @ines

So if I have two Prodigy dataset, one containing binary annotation, and another containing annotations from the ner.manual recipe.

Which dataset should I train my en:blank model on first? Thereafter, do I use the updated model as the base model to train on another dataset?

Thanks.

You'd usually want to start with the manual annotations that contain more information, to give the model as much as possible to learn from. After that, you can improve it with binary annotations, and/or use the pretrained model to collect more relevant data using its suggestions.