Port from old to new version

jsnleong · January 14, 2020, 2:14am

Hi,

I have previously trained my NER model using the old version of prodigy (1.8.4), and understand that now with the new version, some recipes have changed.

@ines, @honnibal, mentioned that models have to be re-trained again. Is that the only way? Can I export the binary annotations from the prodigy database, and import it to the new database for training in the latest version of Prodigy?

Also, I have previously used ner.teach in old version of prodigy, and batch-train. However, understand that the new version has the --binary flag. Would this cause a big difference in my model, or do you recommend i port it over to the version of prodigy to train with the flag set as true?

Thanks.!

ines · January 14, 2020, 11:23am

There shouldn't really be any breaking changes, though, and the deprecated recipes all still exist and can be used. We've just introduced a bunch of new features and workflows (also see the changelog for details).

Yes, if the model implementations change, you'll have to retrain your model with that the new version. We limiting those changes to minor and major versions of spaCy only. The only reason we're changing the implementations is to improve them (and that sometimes means backwards-incompatible changes).

You typically won't have to import or export anything – the Prodigy database is in your user home directory by default, and any version of Prodigy can access the same database. So you could even have multiple versions of Prodigy in different virtual environments that all access the same data.

Setting the --binary flag on the new train recipe is pretty much equivalent to running ner.batch-train with binary annotations. So there shouldn't be any fundamental difference. However, changes to the training process, model implementations in spaCy and bug fixes can obviously have an impact on your results.

jsnleong · January 15, 2020, 1:14am

Hi @ines,

Then may I ask the purpose of the --binary flag? What is the difference in the training process at the backend?

Also, if I mix binary annotations together with ner.correct recipe in the same Prodigy database, would it be an issue in training?

Thanks once again!

ines · January 15, 2020, 11:20am

Yes, see my comment here for details on what makes training from yes/no annotations special and how it works:

Prodigy 1.90 train recipe --ner-missing argument

In a typical training scenario, you're updating a model with examples and the correct answer – e.g. with a text and the entities in it. In some cases you may also have partial annotations: you know some entities but not all.

Prodigy's active learning recipes like ner.teach also let you collect binary yes/no decisions. The data you create here is different again: for some spans, you know that they are entities, because you accepted them. For the ones you rejected, you know that they're not of type X – but they could potentially be something else. This requires a different way of updating the model: you want to update with the positive examples where you know the answer, and proportionally with the "negative" example where you only know that a certain label doesn't apply. That's the type of training the --binary flag enables.

Yes, you shouldn't mix those in the same dataset because you want to update differently depending on the type of annotation. For the binary annotations, you want to set the --binary flag to take advantage of the yes and no decisions and to treat all other tokens as missing values. If you've collected data with ner.correct and the annotations are complete (all entities in the text are labelled), you want to take advantage of that and let the model treat all unlabelled tokens as outside of an entity and not missing values. This gives you better results.

jsnleong · January 15, 2020, 2:12pm

Thanks alot for your reply @ines

So if I have two Prodigy dataset, one containing binary annotation, and another containing annotations from the ner.manual recipe.

Which dataset should I train my en:blank model on first? Thereafter, do I use the updated model as the base model to train on another dataset?

Thanks.

ines · January 16, 2020, 12:36pm

You'd usually want to start with the manual annotations that contain more information, to give the model as much as possible to learn from. After that, you can improve it with binary annotations, and/or use the pretrained model to collect more relevant data using its suggestions.

Topic		Replies	Views
Prodigy annotations from older from to newer version usage , ner , spacy , solved	5	967	January 16, 2020
Help with --binary flag ner , done	8	718	April 8, 2021
Optional ner.correct argument --update error ner , done , nightly	4	726	April 25, 2021
Training on binary annotations throws error done , training	4	685	August 12, 2021
Training a model on both gold and binary data usage , ner , done	11	1495	August 27, 2021

Port from old to new version

Related topics