How to use a spaCy pattern in Prodigy

Hi! Sorry if this wasn't fully clear – I'll see if we can add the pattern file details more prominently in the docs :slightly_smiling_face:

The good news is, spaCy patterns are fully compatible with Prodigy. So in order to use your existing patterns, all you have to do is create a file like patterns.jsonl containing one object per line, each with a key "label" and "pattern". For example:

{"label": "YOUR_LABEL", "pattern": [{"IS_ASCII": true}, {"ORTH": "-"}, {"IS_ASCII": true}]}

This is also the same format used by spaCy's new EntityRuler btw – so if you've been working with that, you can reuse the exact same patterns files.

To test your patterns, you can use the ner.match recipe, which will show you all matches in the data and ask you to accept / reject them. For example:

prodigy ner.match your_dataset en_core_web_sm /path/to/your_data.jsonl /path/to/patterns.jsonl --label YOUR_LABEL

The ner.make-gold workflow currently doesn't have a --patterns argument – it really only goes through the doc.ents set by a spaCy model, pre-highlights them in the texts and lets you correct those entities manually. However, thanks to spaCy v2.1 and the new EntityRuler, you can still make this work:

  • Create a new EntityRuler and add your patterns to it (see here for more info).
  • Load a pre-trained model and add the entity ruler to the pipeline.
  • Save the modified model with the entity ruler to disk using nlp.to_disk – the entity ruler and its patterns will be serialized automatically and loaded back in when you load the model. The doc.ents set by that model now include the pattern matches.
  • Load the saved model into ner.make-gold and annotate entity predictions plus pattern matches.
prodigy ner.make-gold your_dataset /path/to/saved-model /path/to/your_data.jsonl --label YOUR_LABEL