Feature Request: Antipatterns

There are some NER candidates Prodigy suggests that I know a priori are wrong. For example, ones that consist entirely of whitespace.

The ner.teach recipe should have an “antipatterns” option that allows you to specify a set of patterns that will always be marked “reject” without showing them to the user.

1 Like

Hi @wpm,

This happens on some structured texts with default models shipped with Spacy and so Prodigy which uses spacy. I wrote a filter component which filters out such patterns in Spacy. However for prodigy I think the ner.teach recipe needs some specialization. May be @honnibal can clear this up much better.

@wpm: This thread discusses a similar problem, and has some code you might find useful: patterns using regex or shape

We’ll think about whether there should be explicit support for something like this, thanks.