ner.correct and patterns

alphie · July 23, 2024, 10:31am

Is there a way to use a patterns file alongside ner.correct?

I am extending my model to pdfs with a different layout, and ner.correct gets a lot of the desired entities thus reducing the annotation (great)

Unfortunately one entity (eg INV_NUM) is a bit different from what the original model was trained on. INV_NUM for the original model was quite varied. But the new layout INV_NUM could easily be found with a pattern eg XXdddddd

I see my choices as;

reannoate everything using patterns
use ner.correct saving lots of annotation time, but having to reannotate INV_NUM
OR (maybe being greedy here) – could I use ner.correct and a patterns file together?

magdaaniol · July 23, 2024, 5:51pm

Hi @alphie,

The reason the correct built-in recipes do not allow patterns is that it is not always clear how to reconcile the conflicting predictions from the model and the rule-based component. Mostly because it's hard to make a local decision that would be consistent globally.

There are several options available in your case I think:

Make a ner.manual pass with the INV_NUM pattern on your current dataset focusing just on this label, then retrain the model and continue with your ner.correct workflow. This way you ensure that the reannotated INV_NUM is consisent and the model corrections are done more efficiently.
You could modify your spaCy pipeline to include the rule based NER before the trained NER so that the final output takes into account the patterns. Here's spaCy docs on how to combine patterns with trained NER component: Rule-based matching · spaCy Usage Documentation
You could write a custom ner.correct recipe by replicating the logic from combining model's predictions with patetrns from ner.teach. You can inspect the code for both recipes in your Prodigy installation path (run prodigy stats and check Location to double check where that is on your machine)

I personally would recommend 1 or 2 as these are 1) cleaner and more tractable wrt data management 2) easier/faster to implement.

Topic		Replies	Views
NER - Multi-entity and proper use of datasets ner , database , best-practices	2	2096	February 7, 2019
Combine ner.teach and ner.correct? enhancement , usage , ner	1	553	November 20, 2020
ner.manual pattern file usage , ner	1	663	August 21, 2021
ner.teach not giving relevant entities from patterns jsonl ner , done	21	2841	October 2, 2018
how to use ner.correct --update usage , ner , solved	4	671	October 21, 2021

ner.correct and patterns

Related Topics