show false negative/false positives in NER

cristianmtr · July 22, 2020, 12:46pm

Hey

I am trying to find out which of the entities annotated for NER are either skipped (false negatives) or which pieces of text the model is incorrectly picking up as entities (false positives). Is there an easy way to do this via the Prodigy/Spacy API?

I hacked my way through the code a bit but couldn't find anything. The closest I could get from the train recipe was the scores object, but that only contained the scoring. It would be really nice to store the predictions. Then we could compute other metrics / plots (conf. matrix, etc.)

ines · July 24, 2020, 8:32am

Hi! There's no built-in function for this at the moment, but it should be pretty straightforward to implement. You probably want to do this as a separate step, though, after you've trained the model – and you probably also want to use a separate dedicated evaluation set instead of just doing a random split so you can compare the results more reliably (if you're not doing that already).

To get the false positives/negatives, you can then process your evaluation data with your trained model and compare the "spans" against the predicted doc.ents:

data_tuples = ((eg["text"], eg) for eg in your_evaluation_data)
nlp = spacy.load("./your_trained_model")
for doc, eg in nlp.pipe(data_tuples, as_tuples=True):
    correct_ents = [(e["start"], e["end"], e["label"]) for e in eg["spans"])
    predicted_ents = [(e.start_char, e.end_char. e.label_) for e in doc.ents]
    for ent in predicted_ents:
        if ent not in correct_ents:
            print("False positive:", ent)
    for ent in correct_ents:
        if ent not in predicted_ents:
            print("False negative:", ent)

euricocovas · March 8, 2021, 5:26pm

Hi, the above does not seem to work in spacy 3.0.3. I tried

def confusion_matrix(your_evaluation_data=None, ner_model = None, nameForNewLabel='PRODUCTS'):
    #
    tp,fp,fn,tn = 0,0,0,0
    #
    data_tuples = [(eg.text, eg) for eg in your_evaluation_data]
    # see https://spacy.io/api/language#pipe
    for doc, example in ner_model.pipe(data_tuples, as_tuples=True):
        # correct_ents
        ents_x2y = example.get_aligned_spans_x2y(example.reference.ents)
        correct_ents = [(e.start_char, e.end_char, e.label_) for e in ents_x2y]
        # predicted_ents
        ents_x2y = example.get_aligned_spans_x2y(doc.ents)
        predicted_ents = [(e.start_char, e.end_char, e.label_) for e in ents_x2y]
        #
        for ent in predicted_ents:
            if ent not in correct_ents:
                print("False positive:", ent)
        for ent in correct_ents:
            if ent not in predicted_ents:
                print("False negative:", ent)
        # false positives
        fp += len([ent for ent in predicted_ents if ent not in correct_ents])
        # true positives
        tp += len([ent for ent in correct_ents if ent in predicted_ents])
        # false negatives
        fn += len([ent for ent in correct_ents if ent not in predicted_ents])
        # true negatives
        tn += len([ent for ent in predicted_ents if ent in correct_ents])
    
    return tp,fp,fn,tn

but after a lot of effort those averages do not match the values I see in

    scores_testing = ner_model.evaluate(test_data)
    print("scores_training")
    print(scores_testing)
    precision_test = scores_testing['ents_per_type'][nameForNewLabel]['p']
    recall_test = scores_testing['ents_per_type'][nameForNewLabel]['r']
    f1_test = scores_testing['ents_per_type'][nameForNewLabel]['f']

Any clues why? Many thanks, Eurico

SofieVL · March 10, 2021, 9:34am

Hi! The custom function you wrote takes a parameter nameForNewLabel but it looks like that isn't actually being used, so it'll return results aggregated from all labels. That also means you'll have to compare it with scores_testing['ents_p'] (etc) instead of the label-specific values.

If it still doesn't match - can you paste some actual numbers to check exactly what the difference is?

euricocovas · March 10, 2021, 2:54pm

Hi many thanks for your quick answer. It was my mistake, now I get exact match with your spacy internal numbers, both precision and recall and f1. I changed my code to

def confusion_matrix(your_evaluation_data=None, ner_model = None):
    #
    tp,fp,fn,tn = 0,0,0,0
    #
    data_tuples = [(eg.text, eg) for eg in your_evaluation_data]
    # see https://spacy.io/api/language#pipe
    for doc, example in ner_model.pipe(data_tuples, as_tuples=True):
        # correct_ents
        ents_x2y = example.get_aligned_spans_x2y(example.reference.ents)
        correct_ents = [(e.start_char, e.end_char, e.label_) for e in ents_x2y]
        # predicted_ents
        ents_x2y = example.get_aligned_spans_x2y(doc.ents)
        predicted_ents = [(e.start_char, e.end_char, e.label_) for e in ents_x2y]
        #
        for ent in predicted_ents:
            if ent not in correct_ents:
                print("False positive:", ent)
        for ent in correct_ents:
            if ent not in predicted_ents:
                print("False negative:", ent)
        # false positives
        fp += len([ent for ent in predicted_ents if ent not in correct_ents])
        # true positives
        tp += len([ent for ent in predicted_ents if ent in correct_ents])
        # false negatives
        fn += len([ent for ent in correct_ents if ent not in predicted_ents])
        # true negatives
        tn += len([ent for ent in correct_ents if ent in predicted_ents])
    
    return tp,fp,fn,tn

However, it would still be nice to get tp,fp,fn,tn out via spacy, maybe one day you could add that feature. Precision and recall are good to have, but it is even better to get raw tp,fp,fn,tn number for detailed debugging. Thanks and well done on doing spacy, amazing great package!!!!

SofieVL · March 10, 2021, 3:53pm

Happy to hear you got it working and thanks for posting the code as reference - it might always be useful for others finding this topic later

We'd have to think about how to get that functionality into spaCy without causing too much additional overhead when running the training, because often you really only want the numbers and additional information would just take up memory. But yes you're right that it would be a convenient feature for diving into your model's predictions.

vtorres · May 2, 2022, 7:29pm

Have you guys been able to implement a version of this in the new version of spacy? For some reason Im having a hard time trying to implement the above function into the new version of Spacy. What we are looking is to get is a confusion matrix for each entity. We really appreciate your help.

Thanks,
Victor

koaning · May 3, 2022, 9:38am

If you're interested in discussing spaCy features, you may find more answers in the discussions forum for the spaCy repo on GitHub. In particular, I found this script that a user made that could be helpful immediately here.

Topic		Replies	Views
Getting the Negative Instances in a Trained Model usage , training	1	336	August 1, 2021
confusion matrix usage , ner , spacy	2	811	April 23, 2022
Accessing probabilities in NER ner , spacy	4	8320	April 5, 2018
false positives in Spacy NER usage , spacy	1	1032	November 7, 2019
Detailed evaluation of NER model trained from Prodigy annotations usage , ner , training	6	720	December 14, 2021

show false negative/false positives in NER

Related topics