Interpreting ner.batch-train results

Hey,

I’m having a hard time interpreting the results from ner.batch-train.

→ prodigy ner.batch-train long_text_annotations_gold en_core_web_sm --output ./ner/models/model-3-person --eval-split 0.2 --label PERSON --n-iter 10

Loaded model en_core_web_sm
Using 20% of examples (719) for evaluation
Using 100% of remaining examples (2881) for training
Dropout: 0.2  Batch size: 32  Iterations: 10


BEFORE     0.457
Correct    86
Incorrect  102
Entities   1839
Unknown    371


#          LOSS       RIGHT      WRONG      ENTS       SKIP       ACCURACY
01         4.557      166        22         1554       0          0.883
02         2.779      162        26         1581       0          0.862
03         1.982      160        28         1501       0          0.851
04         1.589      159        29         1524       0          0.846
05         1.314      161        27         1523       0          0.856
06         1.639      157        31         1525       0          0.835
07         0.879      159        29         1505       0          0.846
08         0.763      160        28         1521       0          0.851
09         0.665      157        31         1486       0          0.835
10         0.568      159        29         1499       0          0.846

Correct    166
Incorrect  22
Baseline   0.457
Accuracy   0.883

The standard model report Correct 86 and Incorrect 102. However, what do these numbers resemble? entities? I think the stats are confusing, given that I have 719 evaluation examples.

In general, how to go about interpreting these stats?

It’s been I while since I touched Prodigy, so pardon me for this rookie question. Thank you!

Hi Ole,

The stats are a little tricky because the binary ner.teach annotations only provide partial supervision and partial evaluation. If you’ve clicked “accept” on an entity and the model predicts it correctly, we know that’s correct. Likewise, if you clicked “reject” and the model makes that mistake, we know that’s wrong. We also know the model is wrong if it misses an entity you’ve accepted.

However, if you click “reject” on an entity and the model doesn’t predict it, we don’t necessarily know whether the model was correct. It could be wrong for a different reason. We also will have many predictions from the model that we don’t have any information on.

The ents column shows how many entities the model predicted, and the right and wrong columns show how many are definitely correct, and how many are definitely wrong.

1 Like

So in the case of rejected and not predicted, the phrase might actually belong to some other type of entity which was not suggested by ner.teach and thus rejected by the user? Or it might be that it simply doesn’t belong to any entity type?

Cool. So the Unknown + Accept + Wrong = reject + accept actions, and is thus the total number of annotations residing in the evalutation set?

Thanks again!

Yes, that’s correct.

1 Like