Feature Request: Antipatterns


(W.P. McNeill) #1

There are some NER candidates Prodigy suggests that I know a priori are wrong. For example, ones that consist entirely of whitespace.

The ner.teach recipe should have an “antipatterns” option that allows you to specify a set of patterns that will always be marked “reject” without showing them to the user.

Forcing NER to ignore stopwords
(Sandeep N) #2

Hi @wpm,

This happens on some structured texts with default models shipped with Spacy and so Prodigy which uses spacy. I wrote a filter component which filters out such patterns in Spacy. However for prodigy I think the ner.teach recipe needs some specialization. May be @honnibal can clear this up much better.

Using terms.train-vectors recipe with NER
(Matthew Honnibal) #3

@wpm: This thread discusses a similar problem, and has some code you might find useful: patterns using regex or shape

We’ll think about whether there should be explicit support for something like this, thanks.