How to train a NER model with unbalanced entities?

damiano · May 10, 2019, 8:19am

Hello,
i have build a ner model with many custom labels, it works really well. The training is based on a 300k sentences corpus.
Now the problem is that i have to add another label but the annotated sentences are ~7k, so very unbalanced.
The points are two:

Reduce the annotated sentences from 300k to 7k, to have a balanced distribution of the labels, but, in this case it will decrease the accuracy of the labels that were in the previous 300k sentences.
Train the model with the first labels on the 300k sentences (so good accuracy as i wrote before) and then update the model with the new label on only 7k sentences (that basically are first 7k sentences of the 300k i previously mentioned), but in this case the new label will have poor weight i think…

so…what can i do?
Thanks

honnibal · May 11, 2019, 4:52pm

The first thing to try is obviously the simplest, so: what happens if you just add the new annotations and retrain? You can try a few things like upsampling the sentences with the rare classes, but collecting more annotations for them is likely to be a more effective approach. Using pattern rules can also help, if you’re finding that phrases that match these entities exactly still aren’t being recognised by the model.

Finally, I’ve only just thought of this so maybe it’s not effective, but you could try training a text classifier on to predict which sentences contain at least one example of the entity you’re trying to annotate more of. You could then use this text classifier to select sentences that will be better targets for annotation.

Topic		Replies	Views
Advice wanted: NER with novel types and an unbalanced dataset usage , ner	2	370	November 2, 2021
handle imbalance in named entity recognition usage , ner	1	297	August 17, 2021
Unevenly spread labels - does it affect the suggestions made? ner , solved	2	392	November 12, 2018
Train one label on a model that has two entities usage , ner , solved , finance	4	776	May 21, 2019
ner.batch-train with single label usage , ner , solved	3	679	June 6, 2019

How to train a NER model with unbalanced entities?

Related topics