Formatting Prodigy annotations for evaluation of external NER models using spaCy


I have some (already trained) NER models I'd like to evaluate, so I'm using Prodigy to annotate my text with labels (i.e. "NAME"), and then I plan to use spaCy's scorer function to compare the labeling of my models to the gold standard. My current plan involves using the ner.manual recipe to make my gold standard annotations in Prodigy, and I've also created a custom Prodigy recipe that sends my input data to my NER model and creates a dataset with its annotations. From there, I've tried exporting the gold standard annotations using db-out into the .jsonl format, which I am then attempting to convert to a Doc object, which will be the reference in my Example object, on which I would call spaCy's Scorer. However, I'm having a hard time converting the Prodigy annotations' .jsonl output to a Doc object, as the output doesn't have all the attributes needed for me to make a complete Example object.

Do you have any suggestions on how to go between Prodigy and spaCy in annotating the data and then scoring? I tried using data-to-spacy, but since I'm only concerned with model evaluation and not training, I'm not sure how I can get my evaluation scores (most importantly, the confusion matrix of True Positives, True Negatives, etc.) without needing to go through the training part, too.

Thank you in advance for your attention!

Hi @shulim, welcome to Prodigy!

You can still probably use data-to-spacy to just obtain the .spacy file for evaluation. Then once you have the .spacy file for your test / gold data, you can use spacy evaluate to perform evaluation. So to recap, one thing you can do is:

  • Use data-to-spacy to get .spacy-formatted files for your gold standard data (ignoring the training part),
  • Then use spacy evaluate to perform the actual model evaluation.
1 Like

Thank you for your helpful response @ljvmiranda921! I'll try working with data-to-spacy. And just a clarifying question for the spaCy side (which I realize might not be fitting here--apologies if so--but I thought I might try): if the model I'm trying to evaluate is already trained, could I add it as a custom component to the pipeline of a blank spaCy model (and then use spacy evaluate), or would there be a better way to integrate the model into spaCy?

Hi @shulim , glad it helped!

Quick question: was the model trained from spaCy? If it already is, then you can load it directly instead of "adding" it to a blank spaCy model (if I understood your question directly). This means, you pass the path to that model instead of using blank:en.

@ljvmiranda921 Thank you for following up! The model was not trained from spaCy (it was trained by someone else), so I've added a custom component in spaCy that calls the model and modifies the doc object of predictions. However, when I run spacy evaluate, I only get token scores (e.g. token_acc, token_p) rather than the entity scores (e.g. ents_p, ents_r), so I don't think my gold standard annotations put through data-to-spacy have the token entity labels in the same place as my custom component output, which puts them in the doc.ents attribute.