i have build a ner model with many custom labels, it works really well. The training is based on a 300k sentences corpus.
Now the problem is that i have to add another label but the annotated sentences are ~7k, so very unbalanced.
The points are two:
Reduce the annotated sentences from 300k to 7k, to have a balanced distribution of the labels, but, in this case it will decrease the accuracy of the labels that were in the previous 300k sentences.
Train the model with the first labels on the 300k sentences (so good accuracy as i wrote before) and then update the model with the new label on only 7k sentences (that basically are first 7k sentences of the 300k i previously mentioned), but in this case the new label will have poor weight i think…
so…what can i do?