"Negative" pattern matching (RegEx)


I am currently trying to workaround constructing my pattern file. My intention is this… I would like to avoid matching a specific pattern. Can this be done?

[{label: “CONDITION”}, {“pattern”: [{"TEXT: {“REGEX”: “^(?!no)”}}, {“LOWER”: “back”}, {“LOWER”: “pain”}]}

What I am essentially trying to achieve is:
Match spans of text that corresponds to " XXXXX back pain ", where XXXXX is NOT the word “no”. So, in essence XXXXX could be possibly anything like “has”, “got”, “have”, etc…

I would like Prodigy to filter these spans to present only such patterns. Is this form of “negation” regex possible in Prodigy?


spaCy’s rule-based Matcher now supports a NOT_IN attribute – so instead of your regex, you could do something like "TEXT": {"NOT_IN": ["no", "not"]}. Or even "LOWER" instead of "TEXT", to make it case-insensitive.

If you do end up finding that you need more complex regular expressions and match logic, you could also consider implementing your own regex matcher that extracts spans from your text and presents them for annotations (see the post I linked here). So basically, don’t use spaCy’s token-based Matcher and re.finditer instead. The code should be pretty straightforward, because all you need are the start and end character offsets of the match – and you’ll be able to get that easily from your regex matches.