We just published a (still experimental) new demo that should hopefully make it easier to create and test complex token match patterns. I've wanted a debugging tool like this for a while, so I finally decided to build it
Each box on the left represents one token – you can add as many token attributes and tokens as you like, including operators (to repeat or negate tokens). You can then enter your text and view the matches, check spaCy's tokenization, open a displaCy visualization to verify spaCy's predictions, and even copy-paste the Python pattern to use in your code.
Prodigy's patterns follow the same format as spaCy's
Matcher patterns – so you can also use the tool to test your
patterns.jsonl for bootstrapping the entity recognizer or text classifier.
(Disclaimer: As I mentioned above, the demo is still experimental. It also currently doesn't indicate overlapping matches. So if you pattern matches a span of several tokens and one of the contained tokens separately, you'll only see the largest match.)