Hi! The process of comparing the model's predictions to the correct examples in the evaluation data just runs during evaluation and is not something that's typically saved out with the model (because it's only relevant for computing the score).
However, you can always get that information by running your trained model over your evaluation data and then comparing its predictions (e.g.
doc.cats) against the correct answers you have in your evaluation data. If the model predicts something that's not in the evaluation data, that's a false positive. If the evaluation data contains something that's not predicted by your model, that's a false negative.
For example, if you're annotating named entities, you could do something like this:
for eg in examples:
doc = nlp(eg["text"])
predicted_tuples = [(ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]
gold_tuples = [(span["start"], span["end"], span["label"]) for span in eg.get("spans", )]
# Output the information however you like
for ent in predicted_tuples:
if ent not in gold_tuples: # predicted by the model but not in evaluation data
print("false positive:", ent)
for ent in gold_tuples:
if ent not in predicted_tuples: # in evaluation data but not predicted
print("false negative:", ent)