Excluding patterns for NER

ines · May 9, 2019, 1:49pm

Hi! This sounds similar to the antipatterns request here:

We don't currently have that implemented out of the box, but you could add a filter to the stream at the very end of the recipe that explicitly doesn't send out an example if the span text is part of an exclude list. For example:

def filter_stream(stream):
    exclude_list = ("with", ".", ",")  # etc.
    for eg in stream:
        span = eg["spans"][0]
        if eg["text"][span["start"]:span["end"]] not in exclude_list:
            yield eg

# End of the recipe
stream = filter_stream(stream)

However, this also means that you won't get to annotate it. From what you describe, it sounds like your model is a bit "lost" and possibly doesn't get to see enough positive examples, so it starts suggesting a lot of very random tokens over and over again. Are you able to add more patterns to help bootstrap the suggestions? Alternatively, it's also possible that your use case just needs the model to be pre-trained more before you can start annotating with the model in the loop. So you might want to experiment with doing some manual annotation first so the model knows at least something about the entity type.

Topic		Replies	Views
Feature Request: Antipatterns enhancement	2	1180	February 21, 2018
NER exclusion patterns usage , ner	1	505	April 12, 2019
Forcing NER to ignore stopwords ner , terms , solved	8	1905	June 10, 2018
NER not containing <word_list> usage , ner , spacy	11	1270	September 9, 2019
Feedback on NER recipes documentation docs , ner , done	2	458	May 12, 2020

Excluding patterns for NER

Related topics