i am trying to improve my ner model adding annotations.
The problem is that the corpus is the same, so when i update() the model passing 10 annotations (first time) then i re-process the same document passing 10 + (5 new annotations). This “method” works. I mean, Spacy will always see all the annotations for each document i use, but, what happen if i use ner.batch-train with --label parameter? Does spacy will only see the label i would like to improve without all the other annotations of the same document?
If yes i think it will decrease (a lot) the accuracy for the other labels.
What are the correct steps to follow to improve the accuracy of a single label WITH sentences that have been already used for training?
You might find the answer here helpful, as I explained the approach in some detail: Work Flow for extending an NER model with new entity types
The short answer is that without the
--no-missing flag, the
ner.batch-train recipe will assume that the examples might contain missing annotations.
@honnibal does it mean that without that parameter the accuracy of the other entities will not decrease using the same corpus?
That’s awesome! In this case I can improve the accuracy of this new type without compromising the others.
Yes, without the
--no-missing flag, all tokens that aren’t annotated are treated as missing values, and the feedback the model gets is basically “don’t know what this is, could be an entity, could be no entity”. If the flag is set, unannotated tokens are treated as “outside of an entity”, so the feedback is “not an entity”.