ner.eval-ab input : should that be the same text as the evaluation dataset?

Arul · November 29, 2018, 8:28pm

I am trying to compare two models. I have a gold evaluation dataset.

prodigy ner.eval-ab gold_eval_500 trained_model1 trained_model1

This seems to expect input data from stdin. I expected the input text considered to evaluate will be from the evaluation dataset. should i seperate out the texts from the evaluation set and give as input text?

ines · November 30, 2018, 1:13pm

Hi! Sorry if this was confusing – the idea of the ner.eval-ab recipe is that it lets you run a quick “live evaluation” with two models, by comparing the output on the given input data.

So instead of having to create the gold-standard evaluation set from scratch, you can quickly click through a bunch of examples and already get an idea of how your models are performing. Because the feedback you give is binary (e.g. green or red), the evaluation process also lets you capture which analysis is better and which model’s output you or the annotator preferred overall. Even two models with similar accuracy scores can produce different parses – and one model’s analysis could be much better thant the other’s, even if they both make the same amount of mistakes in total.

tl;dr: Yes, the quickest way to use ner.eval-ab would probably be to extract the texts from your existing set and load it in as the input data (fourth argument) and use a new dataset to store the new AB annotations you create with the recipe. I’d recommend starting off with a few hundred AB annotations and repeating the process every once in a while as you update and train new models

Topic		Replies	Views
Invalid data for component 'ner' after ner.eval-ab usage , ner , solved	2	689	April 27, 2020
Gold notation, Test/Eval set for already trained model usage , ner	3	931	May 14, 2019
NER - basic model doubt ner	13	380	February 22, 2024
How to evaluate the model accuracy with test data (not part of training) usage , ner , spacy	8	733	March 12, 2024
Eval AB confusing interface enhancement , usage , solved	2	492	January 22, 2019

ner.eval-ab input : should that be the same text as the evaluation dataset?

Related topics