ner.eval-ab input : should that be the same text as the evaluation dataset?

I am trying to compare two models. I have a gold evaluation dataset.

prodigy ner.eval-ab gold_eval_500 trained_model1 trained_model1

This seems to expect input data from stdin. I expected the input text considered to evaluate will be from the evaluation dataset. should i seperate out the texts from the evaluation set and give as input text?

Hi! Sorry if this was confusing – the idea of the ner.eval-ab recipe is that it lets you run a quick “live evaluation” with two models, by comparing the output on the given input data.

So instead of having to create the gold-standard evaluation set from scratch, you can quickly click through a bunch of examples and already get an idea of how your models are performing. Because the feedback you give is binary (e.g. green or red), the evaluation process also lets you capture which analysis is better and which model’s output you or the annotator preferred overall. Even two models with similar accuracy scores can produce different parses – and one model’s analysis could be much better thant the other’s, even if they both make the same amount of mistakes in total.

tl;dr: Yes, the quickest way to use ner.eval-ab would probably be to extract the texts from your existing set and load it in as the input data (fourth argument) and use a new dataset to store the new AB annotations you create with the recipe. I’d recommend starting off with a few hundred AB annotations and repeating the process every once in a while as you update and train new models :slightly_smiling_face:

1 Like