ner.manual pattern file

ines · August 21, 2021, 3:02am

Hi! The ner.manual workflow is fully manual and won't use the model for its predictions (only for tokenization). The main use case of patterns here is to help you pre-label common instances so you don't have to do everything full from scratch.

The ner.correct workflow will stream in the model's predictions and will let you correct them manually. However, it doesn't have an option for patterns because that'd introduce a slightly tricky question about how to deal with conflicts and overlaps, which can often happen. I explain this in more detail in this thread:

spans.correct recipe

This is a bit more difficult and introduces the problem of how matches vs. predictions should be handled, and which to prefer in case there are overlaps. For NER uses cases, you could default to showing either the prediction or pattern match if they disagree – although, it's often useful to see both, but you still want to make sure that your final data ends up with only one version. And while the span categorizer can predict overlapping spans, you'd often still want to pick one span that's most consistent. For instance, the model may predict a "the" + noun phrase, while your pattern describes only the noun phrase. In that case, you want to make sure that your final data only ends up with one of them, not both. The "comparing annotations" workflow described in this issue goes in a similar direction, and it's definitely something you could implement in a custom recipe: Recipe for comparing NER model and manual annotation - #3 by haishao

One option could be to use ner.manual with patterns for your new categories, and ner.correct with the existing model for all others. When you train your model, Prodigy will automatically merge all annotations on the same text, so it's fine if you have the same example annotated twice with different labels.

Alternatively, you could also implement a small variation of ner.manual that also includes the predictions – you just need to make sure that the data you send out doesn't include any overlaps. You could just filter the spans using spaCy's filter_spans utility and prefer whatever comes first if there's a conflict. Alternatively, you could also decide to prefer pattern matches over predictions, or vice versa. Or you could decide this on a per-label basis – ultimately, this depends on the data and what types of conflicts are most common.

Topic		Replies	Views
Detecting Quotes using REGEX in Patterns File for NER usage , ner , spacy , solved	4	552	June 7, 2021
NER automatically update patterns ner	2	332	February 2, 2023
Creating patterns library from scratch usage	2	421	August 18, 2021
Prodigy present text with no matching pattern (ner.manual) usage , ner , solved	5	462	April 12, 2020
Automatically accept NER	2	237	October 13, 2023

ner.manual pattern file

Related topics