KeyError: 'U-STRIKE' when training on a new entitity

Stephan · January 16, 2018, 4:09pm

Hello Explosion,

first I want to thank you for the fantastic work. I just started working with spacy and prodigy and I love it so far. Especially the documentation of spacy is outstanding and a pleasure to read.

I currently want to train a new entity and followed the video [1] to do so. It all works fine to the last step. But now I noticed that my new model only knows this new entity and would not detect any of the other entities anymore, which is probably due to the --label STRIKE in prodigy ner.batch-train strikes_ner en_core_web_lg --output strikes-model --label STRIKE --eval-split 0.2 --n-iter 6 --batch-size 8. So I left this parameter out in order to train the model on all entities, as the documentation says:

If not set, all available labels will be used.

But this leads to the above mentioned KeyError: 'U-STRIKE' error. Can you please help me with that?

Best, Stephan

1: Named Entity Recognition · Prodigy · An annotation tool for AI, Machine Learning & NLP

honnibal · January 16, 2018, 4:18pm

Thanks, glad you like it!

That error is definitely strange. Just to rule out one problem: you’re on Prodigy 1.2 and spaCy 2.0.5, right?

If you add the label manually in the batch_train recipe using nlp.entity.add_label('STRIKE'), does the same problem occur?

ines · January 16, 2018, 4:57pm

To add to Matt's comment above:

What you're seeing here might also be due to the "catastrophic forgetting" problem: as you're teaching the model about your new entity, it's overfitting on the data and "forgetting" what it had previously learned. So if you care about the other entity types, a simple solution is to mix in entities that the model previously recognised correctly. This is pretty easy to do with both Prodigy and spaCy. You can find more details and strategies in this thread:

Stephan · January 17, 2018, 8:29am

Hello you two,

thanks a lot for the quick answers.

@honnibal Yes, I am on the latest versions, I had just installed it yesterday.

(py3) ~: pip freeze | grep 'spacy\|prodigy'
prodigy==1.2.0
spacy==2.0.5

As you suggested I added the line nlp.entity.add_label('STRIKE') to ner.py:327 and the problem still persists.

UPDATE: I added nlp.entity.add_label('STRIKE') to ner.py:333 (as I’m of course not creating a new pipe) and now it works. Thanks a lot.

@ines Thanks a lot for your input too. I’m going to read this article now. In any case I’m a bit confused by the documentation now. It made sense to me that my model only new about one label if I use the --label parameter to only teach it about that one. And that it would not overfit on that one in case I leave the label empty so that it trains on the new one, while not forgetting the old. Or is that a mistake in my thinking and even with leaving out --label it would forget about the old one if I don’t explicitly give it examples of the previously learned entities?

Topic		Replies	Views
ner.batch-train after ner.maual results error (Value error : [E024]) ner , spacy , solved	8	2962	June 26, 2019
Train NER model to improve existing entities spacy vs prodigy ner , spacy	1	953	December 9, 2019
KeyError: 'U-quote' ner , done , windows	9	1937	December 27, 2017
Missing entity result ner , solved	7	918	August 29, 2022
Error in teaching model trained with manual annotation usage , ner , spacy	2	480	November 4, 2019

KeyError: 'U-STRIKE' when training on a new entitity

Related topics