Specific formula for F score, precision and recall NER

ines · July 10, 2021, 4:55am

Hi! If you're training from manually created annotation, the evaluation all happens within spaCy and doesn't depend on Prodigy. spaCy uses a very standard NER evaluation. If you're working with spaCy v2.x, you can view the code here:

github.com

explosion/spaCy/blob/v2.x/spacy/scorer.py

# coding: utf8
from __future__ import division, print_function, unicode_literals

import numpy as np

from .gold import tags_to_entities, GoldParse
from .errors import Errors


class PRFScore(object):
    """
    A precision / recall / F score
    """

    def __init__(self):
        self.tp = 0
        self.fp = 0
        self.fn = 0

    def score_set(self, cand, gold):

This file has been truncated. show original

For spaCy v3.x, it's here:

github.com

explosion/spaCy/blob/master/spacy/scorer.py

from collections import defaultdict
from typing import (
    TYPE_CHECKING,
    Any,
    Callable,
    Dict,
    Iterable,
    List,
    Optional,
    Set,
    Tuple,
)

import numpy as np

from .errors import Errors
from .morphology import Morphology
from .tokens import Doc, Span, Token
from .training import Example
from .util import SimpleFrozenList, get_lang_class

This file has been truncated. show original

If you want to do a comparative evaluation, you can also just run both your models over your evaluation data and then calculate the accuracy however you want to, and consistently for both evaluations.

Some thing to keep in mind here: if you're using a non-spaCy model with a tokenizer that doesn't preserve the original text, this may impact your evaluation. It probably also makes sense to train with spaCy v3 directly (you can use prodigy data-to-spacy and spacy convert to convert your annotations), so you can train a transformer-based that's more directly comparable to another model initialised with transformer weights. Otherwise, your evaluation might not be very meaningful.

Topic		Replies	Views
Prodigy NER model evaluation and custom evaluation scripts ner , spacy	5	2132	February 1, 2023
Evaluation metric: Scorer function returns same values for F,P,R ner , spacy , solved	1	591	May 21, 2019
Recall and Precision (TN, TP, FN, FP) ner , spacy	8	2413	May 17, 2019
Create baseline metrics based on manual NER annotations usage , ner , solved	3	670	June 8, 2020
Prodigy Train NER Results explanation usage , ner , solved	4	618	July 7, 2021

Specific formula for F score, precision and recall NER

Related topics