patterns in ner.match and ner.teach

andrewx24 · May 28, 2019, 7:37pm

Hello,

I was wondering if anyone knows what the difference is between using patterns for ner.teach and ner.match?

Thanks

ines · May 29, 2019, 8:33am

Hi! You might find this answer on the following thread useful: What are the key differences between ner. teach and ner.match ?

What are the key differences between ner. teach and ner.match ?

ner.teach uses a statistical model in the loop, updates it with your accept/reject answers as you annotate, and as the model learns about the entities, it combines the pattern matches with model suggestions. To make the active learning work, ner.teach also uses a sorter to prioritise examples with a prediction closest to 0.5 , i.e. the ones that the model is most uncertain about. This means you won’t necessarily get to see all examples and only a selection. Running the recipe with patterns helps you get over the so-called “cold-start problem”, i.e. training a new label from scratch that the model doesn’t know anything about yet. In order for the model to make meaningful suggestions, it needs to have seen enough positive examples – and that’s where the patterns come in. If the model already predicts something for the entities you want to train, you can also use ner.teach without patterns and simply accept / reject the model’s suggestions.

ner.match on the other hand doesn’t use or update the statistical model – it only finds pattern matches in your text and will ask you for feedback on them as they come in, in the exact order, without skipping any. It can be a useful recipe if you already have a large terminology list or other patterns describing the entities you’re looking for, and you quickly want to collect data, without having to highlight anything by hand. It also lets you write more general patterns that potentially produce false positives (like, “two upperdcase tokens”), move through them quickly and collect both positive and negative examples for this entity type. That can be super valuable – you might see significantly better results if your data includes both “perfect” examples of entity types, as well as examples of spans that look very similar but are not part of the entity type.

So, in terms of the use of patterns, this means that ner.teach uses patterns to bootstrap the annotation process and doesn't necessarily show you all the matches. It just tries to produce enough positive examples so that the model can kick in. ner.match uses the patterns to show you all matches in your data, one by one, as they come in.

Anji.Vaidyula · July 10, 2019, 12:34pm

Quick question on ner.match - if it doesn’t use a model or update a model, why does it take a model as input?

ines · July 10, 2019, 12:54pm

spaCy’s Matcher and PhraseMatcher need a tokenizer and a vocab so they can do token-based matching. A model package is the easiest way to pass that in, and it also lets you provide custom models with your own tokenization etc.

Topic		Replies	Views
What are the key differences between ner. teach and ner.match ? usage , ner , best-practices	2	1895	October 12, 2018
Match recipe: docs and distinction from ner.manual docs , done	2	438	March 15, 2020
Prodigy using the model instead of the patterns during ner.teach ner , solved	2	807	January 11, 2018
Feedback on NER recipes documentation docs , ner , done	2	451	May 12, 2020
NER - Multi-entity and proper use of datasets ner , database , best-practices	2	2109	February 7, 2019

patterns in ner.match and ner.teach

Related topics