NER Annotation with ner.teach

fmobrj75 · February 3, 2019, 12:48pm

I have a doubt about multi term entities. I want to create a NER model to identify products in texts, both specific product names but also more general, based on customer complaints. eg (specific: waze, spotify, iphone 6S 32Gb, Ford Taurus, etc., Barbie Doll, Smart TV LG 43 43uiS40), but also general products as smart tv, doll, etc. when it appears alone, without further specification. The problem is when I try to use ner.teach if SMART TV LG 43 appear in the text, prodigy tends to highlight only smart tv. If I accept it reinforces the behaviour and will allways show only smart tv and never highlight the full product SMART TV LG 43. Am I doing something wrong?

ines · February 4, 2019, 11:06am

Hi! If you want to use binary annotation with a model in the loop, you’re always giving feedback on the suggestion in this exact context. So you should definitely reject incomplete spans. This way, you’re telling the model “no, this particular analysis of the text is incorrect”, the weights will be updated to reflect that particular decision and the model will “try again” with a different analysis, hopefully moving towards more correct entity boundaries.

That said, if your data contains a lot of fairly abstract multi-token entities like that and the model struggles, it might take pretty long until it converges (or it might not converge at all). You could try adding some --patterns, or collect a small set with ner.manual or ner.make-gold that covers the especially complex entities, pre-train the model with that and then improve that pre-trained model further with ner.teach. You might also want to check out this thread, which discusses an approach for extending entity boundaries with rules: Expanding NER to include neighbouring tokens

Topic		Replies	Views
ner.teach does not suggest multiple tokens usage , ner	4	1356	October 16, 2018
Custom multi-word NER model pipeline usage , ner	2	1000	March 8, 2019
Advice on training NER models with new entities usage , ner , hr	13	3893	January 25, 2019
ner.teach suggests spaces as entities? usage , ner , solved	13	1675	August 3, 2018
Train one label on a model that has two entities usage , ner , solved , finance	4	780	May 21, 2019

NER Annotation with ner.teach

Related topics