Prodigy NER model evaluation and custom evaluation scripts

pvcastro · June 27, 2018, 2:19pm

Do you have any details on the evaluation script used for NER models in prodigy? Is it a standard confusion matrix, or do you have a more elaborate script?

Is it possible to use a custom evaluation script? I need it on order to compare results obtained using prodigy with models reported in the scientific articles I’m following.

Thanks!

honnibal · June 27, 2018, 10:52pm

It depends on whether you’re evaluating based on binary annotations, or based on the fully-specified manual annotations.

If you’re evaluating the binary annotations, the accuracy score is based on how many of the accepted entities you got right, and how many predicted entities are inconsistent with the annotations (either because they cross a correct entity, or because they match a rejected entity). There will also be some predicted entities that can’t be evaluated.

If you’re evaluating the manual annotations, then yes the evaluation follows the standard precision, recall and F-measure metrics. Of course, it’ll probably be difficult to directly compare the scores against the scientific literature, as you’ll be using a different dataset. You can find figures comparing spaCy’s NER (which is what we use in Prodigy) using a standard methodology here: https://spacy.io/usage/facts-figures#ner-accuracy-ontonotes5

ines · June 27, 2018, 10:53pm

This example of a custom recipe might also be relevant:

pvcastro · June 28, 2018, 12:55pm

Thanks, I’ll take a look at the references you gave me here!

Sofia · January 27, 2023, 7:04pm

Thanks for the example!

I'm a bit in over my head with custom recipes, so I apologize if this question is too obvious, but that doesn't tally scores on a per-token basis, does it? Could it count per token with minor alterations, or would I need to write an entirely new recipe from scratch?

I've also encountered a few difficulties using spaCy according to dated tutorials, and am concerned as to whether this is compatible with spaCy v 3.4.1, Prodigy v 1.11.8

Thanks in advance!

koaning · February 1, 2023, 9:48am

Hi Sofia,

could you expand on the difficulties that you've encountered? I'm mainly asking because I'd like to rule out a bug on our end. It'd help me if I understood:

What did you try?
What did (not) happen?
What did you expect to happen?

Topic		Replies	Views
Create baseline metrics based on manual NER annotations usage , ner , solved	3	672	June 8, 2020
Specific formula for F score, precision and recall NER usage , spacy , training	1	986	July 10, 2021
Prodigy based ner models evaluation usage , spacy , solved	1	347	October 9, 2020
Ner evaluation probability threshold usage , ner , spacy	2	429	September 15, 2020
Formatting Prodigy annotations for evaluation of external NER models using spaCy usage , ner , spacy	4	603	April 13, 2022

Prodigy NER model evaluation and custom evaluation scripts

Related topics