NER Teach has lower accuracy for other labels

hanzi752 · April 16, 2019, 5:18am

Hi,
I am new for Prodigy.
I recently trained one model to identify FOOD related words based on model “en_core_web_lg” and the accuracy is quite impressive.

However, when I tried some sentences below, I found sth strange.

95% onion.
------------old model----------------
[(‘95%’, ‘PERCENT’)]
------------new model----------------
[(‘95’, ‘WORK_OF_ART’), (’%’, ‘WORK_OF_ART’), (‘onion’, ‘FOOD’)]

It seems the pre-trained model “en_core_web_lg” can identify some other labels which are also important for me. And my own trained model ignored most of them and can only identify “FOOD” (which is good) and “WORK_OF_ART” (I am not sure where is that from)?

Could you please give me some explanation why it is working like that?

Cheers

honnibal · April 16, 2019, 9:50am

If you don’t need the previous entities, can you try to train based on the en_vectors_web_lg model instead of en_core_web_lg? That way you won’t be getting the incorrect predictions.

The next version should behave a bit better when you add an entity type to an existing model, as we implemented a special initialisation procedure in spaCy for that workflow. But by default it’s difficult to add a category to an existing model without messing up the weights.

Topic		Replies	Views
Working with tags usage , ner , spacy	1	412	April 27, 2020
Improve trained models with annotations usage , ner , training	3	521	September 20, 2021
ner.teach - couple of questions ner , done , solved , nightly	9	2650	December 30, 2021
Prodigy not labeling correctly usage , ner	1	512	July 18, 2018
Training few new entities: Result very low usage , ner , spacy	3	17	January 29, 2025

NER Teach has lower accuracy for other labels

Related topics