"Negative" pattern matching (RegEx)

ines · July 10, 2019, 8:35am

spaCy’s rule-based Matcher now supports a NOT_IN attribute – so instead of your regex, you could do something like "TEXT": {"NOT_IN": ["no", "not"]}. Or even "LOWER" instead of "TEXT", to make it case-insensitive.

If you do end up finding that you need more complex regular expressions and match logic, you could also consider implementing your own regex matcher that extracts spans from your text and presents them for annotations (see the post I linked here). So basically, don’t use spaCy’s token-based Matcher and re.finditer instead. The code should be pretty straightforward, because all you need are the start and end character offsets of the match – and you’ll be able to get that easily from your regex matches.

Topic		Replies	Views
Pattern Matcher OR usage , spacy , off-topic	1	442	December 20, 2020
a question about regular expression usage , spacy , solved	5	943	December 5, 2022
Lookingaround with Prodigy / Spacy matcher semantics usage , spacy	2	491	October 7, 2019
demo rule-based matcher results a Value Error in search patterns usage , spacy , solved	2	397	June 16, 2019
REGEX operator in the patterns file usage , spacy , solved	11	1862	August 3, 2020

"Negative" pattern matching (RegEx)

Related topics