F1 Score Reporting for NER

zach · October 24, 2018, 3:57pm

The ner.batch-train recipe returns an accuracy score, but no F1 score. Additionally, while I can retrieve “right” and “wrong” counts from EntityRecognizer.evaluate(), I don’t see the specific error types that would let me calculate either F1 scores or confusion matrices between categories. Is there anywhere that this information is accessible, or if not that, the predictions themselves? Thanks!

honnibal · October 25, 2018, 12:16pm

Unfortunately we don’t report those statistics currently — it’s a good idea, which we’ll add to our list for an upcoming release. In the meantime, you should be able to calculate the accuracies yourself. The best solution would be to group the examples by entity-type, and then make repeated calls to that evaluate() method. This should let you calculate per-entity-type accuracies pretty easily, albeit a bit inefficiently (it’ll re-parse the documents N times for N entity types).

zach · October 25, 2018, 1:56pm

Thanks for the response @honnibal. Glad to know I wasn’t missing something obvious.

Topic		Replies	Views
NER evaluation ner	2	541	July 23, 2020
Only one entity per example in evaluation dataset ner	1	484	September 19, 2019
accuracy not improving much with ner.batch-train usage , ner	16	919	December 20, 2019
Calculating accuracy, precision, recall, and f1 score from evaluation.jsonl file from ner.batch-train usage , ner , spacy	2	1372	January 9, 2020
Prodigy NER model evaluation and custom evaluation scripts ner , spacy	5	2130	February 1, 2023

F1 Score Reporting for NER

Related topics