Specific formula for F score, precision and recall NER

Aziz · July 9, 2021, 7:45am

I know that prodigy uses span level evaluation in a way that if span start or end or label is wrong it is considered as wrong, is it possible to give me more details about the evaluation formulas ? I'm trying to compare prodigy ner results with Bert ner ? specific formulas for recall, precision and f score would be great.

ines · July 10, 2021, 4:55am

Hi! If you're training from manually created annotation, the evaluation all happens within spaCy and doesn't depend on Prodigy. spaCy uses a very standard NER evaluation. If you're working with spaCy v2.x, you can view the code here:

github.com

explosion/spaCy/blob/v2.x/spacy/scorer.py

# coding: utf8
from __future__ import division, print_function, unicode_literals

import numpy as np

from .gold import tags_to_entities, GoldParse
from .errors import Errors


class PRFScore(object):
    """
    A precision / recall / F score
    """

    def __init__(self):
        self.tp = 0
        self.fp = 0
        self.fn = 0

    def score_set(self, cand, gold):

This file has been truncated. show original

For spaCy v3.x, it's here:

github.com

explosion/spaCy/blob/master/spacy/scorer.py

from collections import defaultdict
from typing import (
    TYPE_CHECKING,
    Any,
    Callable,
    Dict,
    Iterable,
    List,
    Optional,
    Set,
    Tuple,
)

import numpy as np

from .errors import Errors
from .morphology import Morphology
from .tokens import Doc, Span, Token
from .training import Example
from .util import SimpleFrozenList, get_lang_class

This file has been truncated. show original

If you want to do a comparative evaluation, you can also just run both your models over your evaluation data and then calculate the accuracy however you want to, and consistently for both evaluations.

Some thing to keep in mind here: if you're using a non-spaCy model with a tokenizer that doesn't preserve the original text, this may impact your evaluation. It probably also makes sense to train with spaCy v3 directly (you can use prodigy data-to-spacy and spacy convert to convert your annotations), so you can train a transformer-based that's more directly comparable to another model initialised with transformer weights. Otherwise, your evaluation might not be very meaningful.

Topic		Replies	Views
Prodigy NER model evaluation and custom evaluation scripts ner , spacy	5	2131	February 1, 2023
Evaluation metric: Scorer function returns same values for F,P,R ner , spacy , solved	1	590	May 21, 2019
Prodigy Train NER Results explanation usage , ner , solved	4	617	July 7, 2021
Evaluating Precision and Recall of NER ner , solved	6	11931	April 30, 2020
Evaluation data for ner model ner	2	378	October 11, 2023

Specific formula for F score, precision and recall NER

Related topics