Thanks for the report.
The way that the seed-based bootstrapping works is, the results from the matcher are interleaved with the predictions from the model. The idea is that at first you'll only see results from the patterns, and then the predictions from the model will gradually come in, as the model learns the category from the patterns.
This is working correctly in the NER case, because the model doesn't start out predicting entities. And if you do text classification with non-mutually-exclusive classes, the model will also avoid predicting the labels initially.
The problem is that your model has exclusive classes, so the model's initial prediction is a score of 0.5 for each example. This means the initial examples are all from the model, rather than the patterns.
The problem should resolve itself if you click through a few batches of these random examples, rejecting them as incorrect predictions. You could also do an initial annotation session to start out with some number of predictions from the matcher first, before switching over to the combined model. This is actually what we used to do, which is why the version in the video behaves a little differently. However, the problem is that it's hard to guess what will work well on different problems, so we now try to avoid coding complicated behaviours into the default recipes. Instead I think it's usually better to provide simpler pieces, and let developers construct the desired behaviours themselves.
Prodigy is driven by recipe scripts, which you can either edit or author yourself. You can get a good set of starter recipes from the repo here: https://github.com/explosion/prodigy-recipes . I think you might prefer to add a flag to the
textcat.teach recipe that lets you only use the
PatternMatcher, rather than combining it with the model. You could then annotate with just the patterns for a while, and use that to train an initial model.