Named entity recognition - phrases

lubalesuan · November 14, 2018, 8:20pm

I have used this tutorial to train my model to recognize a new entity. I would like the model to recognize not only separate words, but phrases as well. Are named entities 1 word sized? Is there a way to generate my own custom training and testing dataset instead of uploading reddit corpus?

Thank you.
Luba

lubalesuan · November 14, 2018, 8:22pm

Also I used displacy to display entities from the text I uploaded. I got a pretty odd result:

ines · November 14, 2018, 11:20pm

Sure, that's what Prodigy is for In the example, we're using data we've downloaded from the Reddit Comments corpus as the input data, because it's freely available and nice to work with. But you can use any text you have, in any format.

No, entities can consist of one or more tokens. One token can only be part of one entity – but many entities like person names or company names typically have several tokens. If you want your model to learn multi-word entities, you training data needs to contain enough examples of that.

Could you share an example of your training data and how you trained your model? What you're seeing here can happen if you train from a pre-trained model but your new data didn't contain any examples of any of the previous entities.

Topic		Replies	Views
Two word entities usage , spacy , solved	8	1734	June 20, 2019
NER from user-generated content (spelling mistakes etc.) usage , ner , solved	5	1552	August 3, 2018
Misspelled named entity extraction usage , ner	1	2911	August 20, 2018
Multi-phrased labels for ner.teach usage , ner	3	992	July 6, 2018
Multi-word entity seeding, entity context usage , ner	19	3960	November 1, 2019

Named entity recognition - phrases

Related topics