ner.batch-train with single label

damiano · June 4, 2019, 8:52am

Hello,
i am trying to improve my ner model adding annotations.
The problem is that the corpus is the same, so when i update() the model passing 10 annotations (first time) then i re-process the same document passing 10 + (5 new annotations). This “method” works. I mean, Spacy will always see all the annotations for each document i use, but, what happen if i use ner.batch-train with --label parameter? Does spacy will only see the label i would like to improve without all the other annotations of the same document?

If yes i think it will decrease (a lot) the accuracy for the other labels.

What are the correct steps to follow to improve the accuracy of a single label WITH sentences that have been already used for training?

honnibal · June 5, 2019, 7:31pm

You might find the answer here helpful, as I explained the approach in some detail: Work Flow for extending an NER model with new entity types

The short answer is that without the --no-missing flag, the ner.batch-train recipe will assume that the examples might contain missing annotations.

damiano · June 5, 2019, 8:57pm

@honnibal does it mean that without that parameter the accuracy of the other entities will not decrease using the same corpus?

That’s awesome! In this case I can improve the accuracy of this new type without compromising the others.

ines · June 6, 2019, 8:24am

Yes, without the --no-missing flag, all tokens that aren’t annotated are treated as missing values, and the feedback the model gets is basically “don’t know what this is, could be an entity, could be no entity”. If the flag is set, unannotated tokens are treated as “outside of an entity”, so the feedback is “not an entity”.

Topic		Replies	Views
ner.batch train output - Right, wrong, accuracy returned as Zero ner	9	943	May 20, 2019
Adding labels in ner.batch-train enhancement , usage , ner , done	3	986	February 20, 2018
Model tagging all texts as labels usage , ner	1	409	July 16, 2019
Train one label on a model that has two entities usage , ner , solved , finance	4	779	May 21, 2019
different dataset for ner.batch-train usage , ner	1	421	August 28, 2019

ner.batch-train with single label

Related topics