Prodigy NER model evaluation and custom evaluation scripts

honnibal · June 27, 2018, 10:52pm

It depends on whether you’re evaluating based on binary annotations, or based on the fully-specified manual annotations.

If you’re evaluating the binary annotations, the accuracy score is based on how many of the accepted entities you got right, and how many predicted entities are inconsistent with the annotations (either because they cross a correct entity, or because they match a rejected entity). There will also be some predicted entities that can’t be evaluated.

If you’re evaluating the manual annotations, then yes the evaluation follows the standard precision, recall and F-measure metrics. Of course, it’ll probably be difficult to directly compare the scores against the scientific literature, as you’ll be using a different dataset. You can find figures comparing spaCy’s NER (which is what we use in Prodigy) using a standard methodology here: https://spacy.io/usage/facts-figures#ner-accuracy-ontonotes5

Topic		Replies	Views
Evaluating Precision and Recall of NER ner , solved	6	11927	April 30, 2020
Create baseline metrics based on manual NER annotations usage , ner , solved	3	669	June 8, 2020
Ner evaluation probability threshold usage , ner , spacy	2	427	September 15, 2020
Specific formula for F score, precision and recall NER usage , spacy , training	1	976	July 10, 2021
feature request: pre-trained model evaluation recipe enhancement	2	737	March 27, 2019

Prodigy NER model evaluation and custom evaluation scripts

Related topics