Prodigy using the model instead of the patterns during ner.teach

wpm · January 11, 2018, 4:40pm

I was using ner.teach to learn to recognize date entities in particular contexts. I use date matching patterns as my seed. What I’ve seen in the past is that Prodigy starts by proposing candidates that exactly match the dates and then only after you have gone through a lot of those will it start making suggestions from the model, e.g. what Prodigy is supposed to do.

I tried this on a new corpus, and the candidates proposed by Prodigy start off coming from the model. Since the model is untrained, they’re essentially random. It looks like the patterns are never used.

I don’t understand how this could be happening. I’m running everything exactly the same way as before. The only thing that’s different is the corpus. The only thing odd is that this new corpus is tiny (order of 100 candidate entities).

Does this sound like a bug, or is there some corner case for small corpora that would compel Prodigy to use the model instead of the patterns?

ines · January 11, 2018, 5:12pm

The most likely explanation would be that no matches or not enough matches are found in the corpus or the respective batches. If ner.teach is used with patterns, the model and pattern matcher are combined, and the results (matches and predictions from the model) are merged using the toolz.interleave function.

In an ideal case, that would look like this (numbers representing a result):

from_patterns = [1, 2, 3, 4, 5]
from_model = [6, 7]
list(interleave((from_patterns, from_model)))
# [1, 6, 2, 7, 3, 4, 5]

However, if the patterns don’t produce a result in that batch, the combined models will only output the model’s predictions.

A simple solution could be to increase the "batch_size", either in your recipe’s 'config' or your prodigy.json. Larger batches mean more potential for pattern matches. As a little sanity check, you might also want to try and run spaCy’s Matcher or PhraseMatcher over a portion of your corpus using the patterns you’ve created – just to verify that it indeed includes matches, and that the matcher isn’t thrown off by different tokenization etc.

wpm · January 11, 2018, 5:14pm

Makes sense. Thanks.

Topic		Replies	Views
Question about EntityRecognizer usage , ner	5	813	July 29, 2020
patterns in ner.match and ner.teach usage , ner , solved	3	840	July 10, 2019
Training NER model from scratch using (forward-looking) patterns usage	8	689	December 17, 2019
ner.batch-train does not suggest any match based on the provided pattern file ner	3	673	September 25, 2018
Match recipe: docs and distinction from ner.manual docs , done	2	438	March 15, 2020

Prodigy using the model instead of the patterns during ner.teach

Related topics