Evaluating Precision and Recall of NER

wpm · January 8, 2018, 3:40pm

I’m asking something a little different. Like you I find it easiest to write my own P/R code. (And it appears that Scorer and nlp.evaluate are utilities that calculate P/R from the spaCy data structures.) But additionally I want to calculate P/R at a given threshold. So the model I’m evaluating is only considered to hypothesize an entity if its confidence score for that entity is above a given threshold t. The goal is to run this for a range of thresholds and draw an f-score ROC curve.

The part I can’t figure out is how to get the model to return scores for the entities it hypothesizes. I thought that was what the cats attribute was for, but that doesn’t behave the way I’d expect.

>>> nlp = spacy.load("en_core_web_lg")
>>> doc = nlp("This is America.")
>>> [entity.label_ for entity in doc.ents]
['GPE']
>>> doc.cats
{}

The documentation and recipe code makes it looks like the EnityRecognizer is what I want. You initialize it with a model and then it returns entities and scores, but I not sure what to pass as input to EnityRecognizer.

That last evaluate function I wrote above does everything I want except it it uses all the entities a model hypothesizes to calculate the score. I can’t figure out how to select just that subset of entities that have a score > t.

Topic		Replies	Views
Recall and Precision (TN, TP, FN, FP) ner , spacy	8	2417	May 17, 2019
Evaluation metric: Scorer function returns same values for F,P,R ner , spacy , solved	1	592	May 21, 2019
feature request: pre-trained model evaluation recipe enhancement	2	737	March 27, 2019
Specific formula for F score, precision and recall NER usage , spacy , training	1	986	July 10, 2021
Create baseline metrics based on manual NER annotations usage , ner , solved	3	672	June 8, 2020

Evaluating Precision and Recall of NER

Related topics