Match recipe: docs and distinction from ner.manual

Great to see the new match recipe!

  1. Looks like directly below the introductory recipe text, prodigy mark is noted in the command terminal rather than prodigy match. Super minor but wanted to raise it in case it throws some folks.
  2. It looks like ner.correct doesn't have a --patterns argument-- was this reference in the match docs intended to refer to ner.manual?
  3. Is the only major difference between ner.manual with match patterns and match that ner.manual requires a model? Just making sure I wrap my head around the different functions!



Thanks for the heads-up – 1 and 2 were both typos / copy-paste mistakes and I also noticed that the docs didn't list the spacy_model argument. So sorry if this made things confusing! Already fixed this and should be live in a second.

The main difference between the new match and ner.manual with --patterns is that match will only show you the matches, with different optionds for how to present them (and lets you accept or reject). If you use ner.manual with --patterns, you're still going through every single example and if a pattern matches, the match is pre-highlighted.

However, if what you want to do is find examples via matches, that type of workflow isn't a good fit. This was kind of a gap in the API that I noticed when working on a small project. For example, you might be working on a text classification project with very imbalanced categories that make it difficult to get over the "cold start problem". So you could start by using match with a few patterns to quickly find enough positive examples for your category, then pretrain a model on that dataset and improve it further, e.g. using textcat.teach. (Early version of the matching logic in textcat.teach tried to do this all in one by preprocessing the stream and starting with only matches – but this wasn't very transparent and a bit too "magical". So match lets you do this more explicitly in a separate step.)

Super clear explanation-- thanks, as always! Stay safe.

1 Like