Improving on spacy's existing NER entities

jsnleong · December 3, 2019, 1:43am

I am currently trying to train a NER model to recognise a "PERSON" name entity. I understand that the current spacy language model (eg. en_core_web_lg) has a trained "PERSON" entity.

However, simply using the pre-trained NER is not good enough, due to certain context of how my notes are structured, it may not capture certain Asian names, or it erroneously tags irrelevant keywords, etc.

I would like to improve the existing model, and my approach is as follows. Pls correct me if I am going in the wrong direction.

I created an EntityRuler to detect certain full-name keywords, and insert them into the pipeline before the 'ner' component
I exported the model, and ran the ner.make-gold recipe using the exported model as the baseline
a) Does the statistical model pick up anything from the EntityRuler?
After correcting some wrong tagging made by model, I saved my annotations and batch-trained my annotations.
a) Should I train my annotations with a fresh blank model? Or should I train it on the existing exported model?

After which, I realised there are some annotations wrongly tagged in the trained model, so I decided to use the ner.make-gold recipe again. However, when I use the recipe against the trained model, I realised that I have to start annotating from the beginning of my dataset again. Why is that happening? This is not the case when I used back the exported model (from step 2), it continues from where it stopped.

Kindly advice.. thanks!

honnibal · December 5, 2019, 9:49am

Hey,

I think your approach sounds pretty good. To answer your question about the EntityRuler, once the annotations are in the dataset after ner.make-gold, it won't matter whether they were initially predicted by the model or the ruler --- they'll still be in there, and the model will learn from them.

To answer your second question, you can try either: usually I recommend training from a blank model, but in your case, since you're using an existing entity type, maybe resuming training will work for you --- give it a try and see.

Finally, I think @ines's answer here will explain the situation with the feed starting from the beginning: Duplicates in ner.manual

Topic		Replies	Views
Train NER model to improve existing entities spacy vs prodigy ner , spacy	1	953	December 9, 2019
Improve trained models with annotations usage , ner , training	3	519	September 20, 2021
ner.teach to silver to gold -- how to best leverage Prodigy's recipes usage , ner	2	1292	August 19, 2019
New language model for NER usage , ner , spacy , solved	2	569	September 17, 2019
how to use ner.correct --update usage , ner , solved	4	685	October 21, 2021

Improving on spacy's existing NER entities

Related topics