Evaluation metric: Scorer function returns same values for F,P,R

dsnlp · May 21, 2019, 3:26pm

I annotated 130 examples using Prodigy for training and 20 others for testing. I used the scorer function

import spacy
def evaluate(ner_model, examples):
scorer = Scorer()
for input_, annot in examples:
    doc_gold_text = ner_model.make_doc(input_)
    gold = GoldParse(doc_gold_text, entities=annot['entities'])
    pred_value = ner_model(input_)
    scorer.score(pred_value, gold)
return scorer.scores


test_results = evaluate(nermodel , TEST_DATA)

The F,P,R scores from this function is all the same value of 89.28. I am not sure why would it return same score of all 3.

honnibal · May 21, 2019, 3:53pm

Precision and recall will be the same if the number of predictions is the same as the number of true annotations. If precision and recall are the same, then F-score must be the same value as both of them as well (since F-score is the harmonic mean of the two values).

You probably want to have a look at your predictions and compare them to the gold standard, to see what’s up. It might be that your model only makes mistakes on the entity type, but not the span boundaries, for instance. Or it might be a less interesting coincidence — after all, the evaluation set is quite small.

Topic		Replies	Views
Specific formula for F score, precision and recall NER usage , spacy , training	1	976	July 10, 2021
Customizing NER predictions from Spacy for the Scorer function ner , spacy	3	3466	May 25, 2019
How to use Scorer function toevaluate a custom model. usage , ner , spacy	1	1435	February 1, 2023
Formatting Prodigy annotations for evaluation of external NER models using spaCy usage , ner , spacy	4	592	April 13, 2022
Prodigy NER model evaluation and custom evaluation scripts ner , spacy	5	2131	February 1, 2023

Evaluation metric: Scorer function returns same values for F,P,R

Related topics