Gold notation, Test/Eval set for already trained model

Kasra · May 13, 2019, 9:17am

Hi Guys,

In step one I train a model with gold annotation (almost 6500 annotations) from scratch and the model training is fine and I have the final model based on my annotations, now in step 2 I make a new piece of text and make gold annotations and saved the gold annotations. I want to test my prodigy model trained on step one vs this new gold annotation to see that how accurate the model is. I want to know that do you have any way/recipe for this action?

Thanks

ines · May 13, 2019, 9:35am

If you want to use the new gold standard evaluation set to evaluate during training, you can pass it in as the --eval-id argument to ner.batch-train.

If you only want to evaluate an already trained model, you could use a custom recipe like this:

In the above version, it takes the name of the dataset containing your evaluation examples, and the model you trained on your training examples. It then outputs the results.

Kasra · May 14, 2019, 8:23am

Thanks Ines for your reply, I pass the --eval-id argument to ner.batch-train and during the raining I got it, I will try the custom recipe today.

I have a question regarding the --eval-id, is there any possibilities we just print out to a list the miss and wrong entities when we have an already evaluated test set. Like a simple csv with the first column a correct match, second column the wrong pick up and the last column for instance is the miss entities. Do we need to make our own recipe for that or we could modify the ner batch train to print out the csv for us?

Thanks
Kasra

honnibal · May 14, 2019, 4:34pm

We had a built-in recipe that did something similar to that during an early beta, but we had too many NER recipes so we consolidated things to avoid confusion.

If you want to get a quick readable summary, you might find the prodigy.components.printers.pretty_print_ner function useful. If you mark the spans with an answer key, which should have a value in "accept", "reject" and "ignore", the spans will be coloured by correctness. I would set the correct predictions to have accept, and false predictions to have reject. You could list the false negatives at the end of the text as well (these might overlap with the predicted annotations, so you can’t easily show them in-line).

The loop to run the model and compare against the gold-standard should be pretty simple. You can have a look at my sample code for calculating precision/recall/F-score evaluation figures in this thread for reference: Recall and Precision (TN, TP, FN, FP)

Topic		Replies	Views
Create baseline metrics based on manual NER annotations usage , ner , solved	3	670	June 8, 2020
Prodigy NER model evaluation and custom evaluation scripts ner , spacy	5	2132	February 1, 2023
feature request: pre-trained model evaluation recipe enhancement	2	737	March 27, 2019
"Gold Standard" dataset as evaluation for ner.batch-train with binary annotation? usage , ner	2	788	May 15, 2019
Recall and Precision (TN, TP, FN, FP) ner , spacy	8	2413	May 17, 2019

Gold notation, Test/Eval set for already trained model

Related topics