NER with Gazetteer

kbyatnal · January 31, 2018, 5:19pm

Often, the accuracy of NER is greatly increased by using a gazetteer as an input to the model. Are there any plans on the roadmap to add this to prodigy?

honnibal · January 31, 2018, 10:31pm

The ner.teach command supports using a patterns file alongside the statistical model. This patterns file can be used for basic literal matches, or also more advanced patterns that feature quantifiers, POS tags, dependency labels, etc.

The interaction between the patterns file and the statistical model isn’t exactly the same as a gazetteer, though. The patterns file is used as a way to suggest entities that you click yes or no to. The answers to these pattern-matched entities are then used to train the statistical model. The answers also affect the match score of the pattern, which allows Prodigy to assign low scores to patterns which are usually rejected, and high scores to patterns which are usually accepted.

The overall idea here is to use the gazetteer to train the statistical model, instead of using it inside the entity recogniser. If you want a pure gazetteer entity recognition component, you can use spaCy’s Matcher or PhraseMatcher classes: https://spacy.io/usage/linguistic-features#section-rule-based-matching . You could add a matcher instance to your pipeline, before the entity recognizer, like this:


import spacy
from spacy.matcher import Matcher

nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)
nlp.add_pipe(matcher, before='ner')

You would then add patterns to the matcher that have a callback that adds them to the Doc as named entities. The subsequent statistical NER is constrained by these existing entities: it can’t propose any entities that overlap or overwrite the ones that are already set.

Topic		Replies	Views
How to perform automatically NER annotation based on patterns? usage , ner , spacy	1	518	June 2, 2021
NER Tagging with patterns usage , ner	1	556	May 9, 2019
Store the annotation obtained by ner.manual and --patterns at once usage , ner , spacy , solved	4	617	June 28, 2021
Add a whole bunch of entities via a vocabulary usage , ner , spacy	2	354	July 13, 2021
Question about EntityRecognizer usage , ner	5	773	July 29, 2020

NER with Gazetteer

Related Topics