Textcat.teach not using the pattern file

Hi! I think the behaviour you're seeing might be related to what I describe in this thread:

In short, if you're starting from scratch with very unbalanced classes or a very large corpus with fewer match candidates, it can happen that not enough initial matches are produced and that matches found later on are skipped due to their score.

When you run a teach recipe with patterns, Prodigy will combine the pattern matches and the model's suggestions. If no matches are found in a batch of examples, Prodigy will only yield the model's suggestions, which can be very random if the model hasn't learned anything yet.

The pattern matcher also assigns scores to the matches, based on how reliably they produce a match. This makes sense for lots of patterns and matches, because you still want to focus on the most important examples – but in other cases with low match density, this can cause the active learning algorithm to actually skip the few existing matches.

I've discussed some solutions in the thread linked above – for example, for the next release, we'll be updating the logic used to sort and merge the matcher and model, to prevent matches from being skipped. In the meantime, you could try and use a separate step to bootstrap the model. The main problem that the patterns are trying to solve is the cold start: you need enough initial training examples for the model to make meaningful suggestions. So you could first find the matches and bootstrap the initial training set, pre-train the model with that data and then use textcat.teach to improve it. One idea could be to repurpose the ner.match recipe and add a "label": "NEG" to the selected examples. You could also check out the recipe source and write your own, or implement a different matching logic with regular expressions etc.