Linguistic features configured for a non-english model

gylling · January 10, 2019, 3:34pm

When teaching a new blank non-english model it seams, that the combination of Prodigy and Spacy isn’t taking the result of the POS tagger into account when presenting words as a suggestion. For instance when looking for names for the “PERSON” label, and the various names is correctly tagged as “PROPN” by the POS tagger, then this feature is not used.

Is that to be expected? - Or where do I look to correct this and verify the exact features enabled by the pipeline.

ines · January 10, 2019, 10:37pm

spaCy's model components are separate and don't share any features, so the part-of-speech tags have no influence on the named entity recognizer and vice versa.

Using the POS tagger's predictions to bootstrap the entity recognizer is a nice idea, though, and something you could do via a custom recipe. For example, your stream could process the text, extract all spans of consecutive proper nouns and accept/reject whether they're entities or not.

Alternatively, you could also use the POS information in your match patterns to narrow in the selection. For example, suggest a token "apple" as an ORG, but only if it's tagged as a proper noun.

{"label": "ORG", "pattern": [{"lower": "apple", "pos": "PROPN"}]}

If your tagger is good, this can really speed things up and improve the selection of examples to annotate.

gylling · January 11, 2019, 6:45am

Thank's for the quick answer. I will look into a custom recipe then.

I got the impression, that the named entity recognizer depended on being after the pos tagger and the dependency parser as of the documentation here:

Nice to know, that they are independent as I then can skip the dependency parser for now.

Topic		Replies	Views
Documentation NER ner , spacy , solved , pos , off-topic	4	632	May 29, 2020
Training POS Tager for Indonesian Language usage , spacy , pos	5	1295	November 20, 2019
Pipeline for POS corrections and dep corrections usage , spacy , dep , pos	1	557	March 31, 2021
POS tag, dependency, and nested entity interfaces? enhancement , usage	1	1640	January 26, 2018
Improving on spacy's existing NER entities ner	1	664	December 5, 2019

Linguistic features configured for a non-english model

Related topics