NER exclusion patterns

eaubin · April 11, 2019, 6:34pm

Does anyone have suggestions for using a pattern file to assist training things do not have an entity label? I’m training an NER task and find I’m often prompted to classify things like spaces, numbers or punctuation that are never entities.

ines · April 12, 2019, 9:39am

Hi! The pattern files are currently only intended to help with generating positive candiates – but you could always add a filter function that only yields out examples with spans that do not match your exclusion pattern. Here’s a simple example that shows the idea (but of course, you can use more sophisticated logic here):

def filter_tasks(stream):
    for eg in task:
        # Check the highlighted span in the example and only
        # send it out if it doesn't match your exclusion list
        span = eg["spans"][0]
        if span["text"] not in ("\n", ".", ","):  # etc.
            yield eg

Btw, you might not always want to use a filter like this, especially not during development. If you’re using a recipe like ner.teach, Prodigy will stream in suggestions from the model with their scores assigned (see the meta section in the bottom right corner). So it can sometimes be very interesting to see what the model is predicting and what scores it’s assigning. For example, you might see that the model is very uncertain about some type of non-entity spans – this could indicate a problem with the data or pre-trained weights. If you filtered those examples out, you wouldn’t ever get to see those suggestions and scores, and you’d never be able to give the model feedback on those.

Topic		Replies	Views
Excluding patterns for NER enhancement , usage , ner	2	727	May 9, 2019
Feedback on NER recipes documentation docs , ner , done	2	451	May 12, 2020
textcat.teach - Patterns not filtering Label enhancement , textcat , done , solved	8	744	January 11, 2019
ner.teach not filtering by label when using patterns file ner , done	2	482	July 2, 2020
Feature Request: Antipatterns enhancement	2	1174	February 21, 2018

NER exclusion patterns

Related topics