We had a built-in recipe that did something similar to that during an early beta, but we had too many NER recipes so we consolidated things to avoid confusion.
If you want to get a quick readable summary, you might find the prodigy.components.printers.pretty_print_ner
function useful. If you mark the spans with an answer
key, which should have a value in "accept"
, "reject"
and "ignore"
, the spans will be coloured by correctness. I would set the correct predictions to have accept
, and false predictions to have reject
. You could list the false negatives at the end of the text as well (these might overlap with the predicted annotations, so you can’t easily show them in-line).
The loop to run the model and compare against the gold-standard should be pretty simple. You can have a look at my sample code for calculating precision/recall/F-score evaluation figures in this thread for reference: Recall and Precision (TN, TP, FN, FP)