New entity model ruins other entities

ines · January 4, 2018, 3:18pm

Sorry about the late reply! I think what you’re experiencing might be whats often referred to as the “catastrophic forgetting problem”. As your model is learning about the new entity type, it’s “forgetting” what it has previously learned. In your example, this is pretty significant – but it might be because you’ve trained a completely new entity, so the only data the model is updating on is examples labelled TECH and none of the other entity types. Because the model is never “reminded” about the other types, it overfits on the new data.

This blog post we’ve published has some more background on this, including strategies to prevent it. One approach is to mix in examples that the model previously got right and train on both those examples and the new examples.

This is pretty easy to do in Prodigy – after collecting annotations for your new TECH entity, run the model on the same input text, and annotate the other labels. You can add all annotations to the same dataset, and then train your model with those examples. Make sure to always use input data that’s similar to what the model have to process at runtime. This might also give you a little boost in accuracy over the standard English model, because you’re also improving the existing entity types on your specific data.

Alternatively, you can also generate those examples yourself, by running spaCy over a bunch of text and selecting the entities you care about the most. (See the PRODIGY_README.html for details on the JSONL format – all you need to do is convert the entity annotations to this format and then import them to your dataset using the db-in command.)

Topic		Replies	Views
Looks like a new trained model has forgotten the old entities usage , ner	1	877	October 14, 2019
ner.train-curve usage , ner	1	1000	February 26, 2018
Catastrophic forgetting when training NER using Prodigy ner , spacy	1	519	February 11, 2020
Need more informations about catastrophic forgetting problem usage , ner	8	1797	April 20, 2018
Generating examples in spacy to address catastrophic forgetting usage , ner , spacy , solved	8	984	January 3, 2022

New entity model ruins other entities

Related topics