Match patterns without creating huge files

ines · March 21, 2019, 5:41pm

Yes, the pattern files support the same syntax as spaCy’s rule-based Matcher, so you can definitely write “smarter” token patterns. For example, here’s a pattern that matches the case-insensitive tokens “apple” and “iphone”, and an optional number token like “11”:

[{"LOWER": "apple"}, {"LOWER": "iphone"}, {"IS_DIGIT": True, "OP": "?"}]

There’s a lot more you can do with token attributes here. If you’re starting off with a pre-trained model, you could also use part-of-speech tags. For example, only match “love” if it’s used as a verb and not as a noun.

Important note: The rule-based matching docs also describe some new features like the extended pattern syntax that are only available in spaCy v2.1. Those are also marked with a little “2.1” tag. You’ll be able to use those once the new version of Prodigy for spaCy v2.1 is available – see here for details.

Topic		Replies	Views
(Re)using labels in patterns usage , spacy	1	315	July 21, 2021
Pattern Matcher OR usage , spacy , off-topic	1	442	December 20, 2020
✨ Tip: Test your patterns with our new Matcher Explorer demo spacy , project	4	2341	May 8, 2023
Patterns and custom NER usage , ner	1	2768	December 27, 2017
Create PhraseMatcher in Spacy and use them to Label data manually ner , spacy , solved , medical	9	1560	December 15, 2020

Match patterns without creating huge files

Related topics