Gold notation, Test/Eval set for already trained model

We had a built-in recipe that did something similar to that during an early beta, but we had too many NER recipes so we consolidated things to avoid confusion.

If you want to get a quick readable summary, you might find the prodigy.components.printers.pretty_print_ner function useful. If you mark the spans with an answer key, which should have a value in "accept", "reject" and "ignore", the spans will be coloured by correctness. I would set the correct predictions to have accept, and false predictions to have reject. You could list the false negatives at the end of the text as well (these might overlap with the predicted annotations, so you can’t easily show them in-line).

The loop to run the model and compare against the gold-standard should be pretty simple. You can have a look at my sample code for calculating precision/recall/F-score evaluation figures in this thread for reference: Recall and Precision (TN, TP, FN, FP)