NER Teach has lower accuracy for other labels

I am new for Prodigy.
I recently trained one model to identify FOOD related words based on model “en_core_web_lg” and the accuracy is quite impressive.

However, when I tried some sentences below, I found sth strange.

95% onion.
------------old model----------------
[(‘95%’, ‘PERCENT’)]
------------new model----------------
[(‘95’, ‘WORK_OF_ART’), (’%’, ‘WORK_OF_ART’), (‘onion’, ‘FOOD’)]

It seems the pre-trained model “en_core_web_lg” can identify some other labels which are also important for me. And my own trained model ignored most of them and can only identify “FOOD” (which is good) and “WORK_OF_ART” (I am not sure where is that from)?

Could you please give me some explanation why it is working like that?


If you don’t need the previous entities, can you try to train based on the en_vectors_web_lg model instead of en_core_web_lg? That way you won’t be getting the incorrect predictions.

The next version should behave a bit better when you add an entity type to an existing model, as we implemented a special initialisation procedure in spaCy for that workflow. But by default it’s difficult to add a category to an existing model without messing up the weights.