ner.correct and patterns

Is there a way to use a patterns file alongside ner.correct?

I am extending my model to pdfs with a different layout, and ner.correct gets a lot of the desired entities thus reducing the annotation (great)

Unfortunately one entity (eg INV_NUM) is a bit different from what the original model was trained on. INV_NUM for the original model was quite varied. But the new layout INV_NUM could easily be found with a pattern eg XXdddddd

I see my choices as;

  • reannoate everything using patterns
  • use ner.correct saving lots of annotation time, but having to reannotate INV_NUM
  • OR (maybe being greedy here) – could I use ner.correct and a patterns file together?

Hi @alphie,

The reason the correct built-in recipes do not allow patterns is that it is not always clear how to reconcile the conflicting predictions from the model and the rule-based component. Mostly because it's hard to make a local decision that would be consistent globally.

There are several options available in your case I think:

  1. Make a ner.manual pass with the INV_NUM pattern on your current dataset focusing just on this label, then retrain the model and continue with your ner.correct workflow. This way you ensure that the reannotated INV_NUM is consisent and the model corrections are done more efficiently.

  2. You could modify your spaCy pipeline to include the rule based NER before the trained NER so that the final output takes into account the patterns. Here's spaCy docs on how to combine patterns with trained NER component: Rule-based matching · spaCy Usage Documentation

  3. You could write a custom ner.correct recipe by replicating the logic from combining model's predictions with patetrns from ner.teach. You can inspect the code for both recipes in your Prodigy installation path (run prodigy stats and check Location to double check where that is on your machine)

I personally would recommend 1 or 2 as these are 1) cleaner and more tractable wrt data management 2) easier/faster to implement.