Documentation NER

KiranParasa · May 28, 2020, 3:20am

Hi,

Finding little difficulty with NER. I believe the system finds verbs and part of speech from the English dictionary.. But for some reason very basic nouns ( which in most cases can't be anything else) were being tagged as verbs giving entirely different outcomes. Maybe we are doing something wrong, or any help will help.

Secondly, I can help out with writing any documentation or experiments/tests in case needed.

Best regards,

ines · May 28, 2020, 5:27pm

Hi! I'm not sure I fully understand the question or the problem – you mention verbs and nouns, but that'd be the part-of-speech tagger and not the entity recognizer? Those components are entirely separate in the spaCy pipeline.

How well a model performs or your data ultimately depends on how similar the data is to what the model was trained on. It's less about the English dictionary and more about the training data. If a model doesn't perform well on your data, you can fine tune it on more examples – for instance, using the pos.correct recipe to improve the part-of-speech tagger: https://prodi.gy/docs/recipes#pos-teach

KiranParasa · May 28, 2020, 8:14pm

Hi Ines, Thanks for the response.
We were trying to extract names from documents and spacy entities missed some of common proper nouns.. When we digged little further, we found part-of-speech tagger had marked the nouns as verbs. My question is that for this use case also we must train the system ? We used common example programs given in the spacy.io page with our sample texts.

I was under impression that 'part of speech' if not tagged properly, it will lead to wrong entity labelling.. May be my question is more related to spacy initial parsing. Pls correct and guide.

Thanks again.

ines · May 29, 2020, 10:07am

The part-of-speech tagger and named entity recognizer are separate and part-of-speech tags are not used as features in the NER model. If both the tagger and entity recognizer struggle with your example, it could be an indicator that your texts are quite different from what the model was trained on.

If you're building a system for a custom use case, you typically want to train your own model, yes. An arbitrary pretrained model you download will always be limited by the data it was trained on – typically some general-purpose corpus. Training a custom model is where NLP gets really powerful and lets you solve your specific problems.

Btw, note that this is the forum for our annotation Prodigy, not a general-purpose forum for spaCy. Topics here sometimes cross over, as Prodigy integrates spaCy, but we're not able to answer spaCy usage questions on here.

KiranParasa · May 29, 2020, 5:39pm

Thanks for the reply Ines.

Topic		Replies	Views
Linguistic features configured for a non-english model usage , spacy , solved	2	470	January 11, 2019
Does spacy NER model use POS for modelling enhancement , ner , spacy	3	1222	October 25, 2018
Model tagging all texts as labels usage , ner	1	409	July 16, 2019
Custom NER Tag for english ner , spacy	1	1612	July 24, 2018
New language model for NER usage , ner , spacy , solved	2	572	September 17, 2019

Documentation NER

Related topics