Training a model on both gold and binary data

ines · January 31, 2020, 11:15am

The --binary flag uses a more complex mechanism to update the model that lets you take advantage of both the accepted and rejected suggestions of single entity spans – basically, like the default behaviour of ner.batch-train. I've written some more about it on this thread:

The --ner-missing flag lets you to specify that unannotated tokens should be treated as missing values (and not as explicitly as "not an entity"). This allows you to still train from incomplete annotations, e.g. if you only have annotations for one or two labels.

The new train command was designed to harmonise training between Prodigy and spaCy, and use spaCy's regular update mechanism to train from gold-standard data (with and without missing values). This also makes it easier to ensure that results are consistent and reproducible.

Topic		Replies	Views
Prodigy single span data incompatible with NER model which expects all data to be present? usage , ner , api	3	876	August 17, 2018
Unexpected NER scores / models when training using gold and binary datasets combined in v1.11.1 usage , ner , spacy	3	363	August 18, 2021
Using binary accept/reject from NER teach in Spacy ner	1	1057	February 5, 2020
train ner dataset -> ValueError: too many values to unpack ner , done	6	2626	January 10, 2020
ner.train number of examples usage , ner	8	1948	August 3, 2018

Training a model on both gold and binary data

Related topics