Great to see the new
- Looks like directly below the introductory recipe text,
prodigy mark is noted in the command terminal rather than
prodigy match. Super minor but wanted to raise it in case it throws some folks.
- It looks like
ner.correct doesn't have a
--patterns argument-- was this reference in the
match docs intended to refer to
- Is the only major difference between
ner.manual with match patterns and
ner.manual requires a model? Just making sure I wrap my head around the different functions!
Thanks for the heads-up – 1 and 2 were both typos / copy-paste mistakes and I also noticed that the docs didn't list the
spacy_model argument. So sorry if this made things confusing! Already fixed this and should be live in a second.
The main difference between the new
--patterns is that
match will only show you the matches, with different optionds for how to present them (and lets you accept or reject). If you use
--patterns, you're still going through every single example and if a pattern matches, the match is pre-highlighted.
However, if what you want to do is find examples via matches, that type of workflow isn't a good fit. This was kind of a gap in the API that I noticed when working on a small project. For example, you might be working on a text classification project with very imbalanced categories that make it difficult to get over the "cold start problem". So you could start by using
match with a few patterns to quickly find enough positive examples for your category, then pretrain a model on that dataset and improve it further, e.g. using
textcat.teach. (Early version of the matching logic in
textcat.teach tried to do this all in one by preprocessing the stream and starting with only matches – but this wasn't very transparent and a bit too "magical". So
match lets you do this more explicitly in a separate step.)
Super clear explanation-- thanks, as always! Stay safe.