This is probably a naive question, so after you train the model for ner in prodigy you get the f1 score, recall and precision. My question is is this score calculated on token level or span level for example if the model predict part of the span ( 2 tokens out of 3 tokens that it has to predict), is the whole prediction considered wrong or do you calculate a score ? Thanks in Advance !
No, this is a good question and the documentation could be improved here.
It's a micro-PRF on the span level. If any part of the span is wrong (start / end / label), the whole span is counted as wrong.
Hey Adriana after doing some manual inspection of the evaluation dataset and the model's predictions, the model's performance on percent shows 100% on precision, recall and f1 score but when I do ner correct on the same evaluation dataset I see that in percent in only grabs the value with the sign (only 10 and not 10 %) I checked the evaluation dataset I m certain that it should label 10% so are you really sure that this performance is calculated on span level ?
I am sure that the evaluation is on the span level.
It's hard to know from a distance what's going on with the example above. Can you try running the model in spacy on that exact evaluation text and inspect the annotated entity spans?