The most likely explanation would be that no matches or not enough matches are found in the corpus or the respective batches. If
ner.teach is used with patterns, the model and pattern matcher are combined, and the results (matches and predictions from the model) are merged using the
In an ideal case, that would look like this (numbers representing a result):
from_patterns = [1, 2, 3, 4, 5]
from_model = [6, 7]
# [1, 6, 2, 7, 3, 4, 5]
However, if the patterns don’t produce a result in that batch, the combined models will only output the model’s predictions.
A simple solution could be to increase the
"batch_size", either in your recipe’s
'config' or your
prodigy.json. Larger batches mean more potential for pattern matches. As a little sanity check, you might also want to try and run spaCy’s
PhraseMatcher over a portion of your corpus using the patterns you’ve created – just to verify that it indeed includes matches, and that the matcher isn’t thrown off by different tokenization etc.