Customizing NER predictions from Spacy for the Scorer function

dsnlp · May 22, 2019, 3:26pm

I am trying to use the scorer function for F,P,R calculation on the predictions I got from the Spacy model. Using the below code I load the trained spacy model and the test data (rows of text) to predict the labels.

sp_model  = spacy.load('/Folder/CustommodeltrainedinSpacy/')
test_data = pd.read_csv('/data.csv')
data_target = data['Text_column'] #Text column


df = pd.DataFrame(columns =['NAME','TAG'],index=range(0,len(data_target)+1))

for i in range(0,len(data_target)):
    doc=sp_model(str(data_target[i]))
    for colname in df.columns:
        for ent in doc.ents:
            if colname == ent.label_:
                df[colname][i] = ent.text

I have annotated data for 2 custom labels ‘NAME’, ‘TAG’ and trained the model. The above snippet gives me a dataframe with two columns of the tags with appropriate text which was tagged.

                NAME                     TAG
 0              John                    Author
 1              Mike                    Student

Now that I have the predictions from the trained model, how do I evaluate using the scorer function?

import spacy
from spacy.gold import GoldParse
from spacy.scorer import Scorer
def evaluate(ner_model, examples):
    scorer = Scorer()
    for input_, annot in examples:
        doc_gold_text = sp_model.make_doc(input_) #Here I used my trained model
        gold = GoldParse(doc_gold_text, entities=annot['entities'])
        pred_value = sp_model(input_) #trained model on input
        scorer.score(pred_value, gold)
    return scorer.scores

My test data is just a column of text. But here the ‘for’ loop has ‘input_,annot’ as variables to loop. In the ‘for loop’ where I do the prediction, I already used the below snippet to create the ‘doc’ element.

        doc=sp_model(str(data_target[i]))

Also, is it necessary for me to use the ‘GoldParse’ function to get the scores?

dsnlp · May 22, 2019, 8:43pm

for text in data_target:
        doc = sp_model(text)        
        print([(ent.start_char, ent.end_char, ent.label_) for ent in doc.ents])

I get the following output as predictions from the above snippet:

[(784, 792, 'NAME'), (802, 829, 'TAG')]
[(134, 142, 'NAME'), (150, 173, 'TAG')]

Now I use the ‘Scorer’ function to evaluate the model: Below is what I tried to convert the preds from test data to a list called ‘entities’ in the Scorer function:

scorer = Scorer()
for text in data_target:
        doc_gold_text = sp_model.make_doc(text)
        entity = ([(ent.start_char, ent.end_char, ent.label_) for ent in doc_gold_text.ents])
        gold = GoldParse(doc_gold_text, entities=entity)
        pred_value = sp_model(text)
        scorer.score(pred_value, gold)
        print(scorer.scores)

It returned all F,P,R values as Zero. When I tried to print doc_gold_text.ents, it returned empty. So ‘nlp.make_docs’ does not return any named entities?

Then I removed ‘.make_doc’ and it returned 100 for all F,P,R values from the scorer function!

   doc_gold_text = sp_model(text)

I manually checked the results using below code and almost all the predictions are correct. Is this the case or is there some error??

 print([(ent.start_char, ent.end_char, ent.label_) for ent in doc.ents])

dsnlp · May 23, 2019, 2:27pm

@ines @honnibal ny inputs on this? Need help

honnibal · May 25, 2019, 1:26pm

I think this guide into how the pipelines work in spaCy will be useful to you: https://spacy.io/usage/processing-pipelines

This forum is really for Prodigy usage questions, which sometimes do overlap with spaCy usage, so we’re generally quite willing to answer questions as they come up. But the main way we “support” spaCy will always be to write more materials. Developing better resources like documentation, usage videos, or the course Ines developed will always be a more scalable solution than answering individual questions.

Topic		Replies	Views
How to use Scorer function toevaluate a custom model. usage , ner , spacy	1	1466	February 1, 2023
Evaluation metric: Scorer function returns same values for F,P,R ner , spacy , solved	1	591	May 21, 2019
Formatting Prodigy annotations for evaluation of external NER models using spaCy usage , ner , spacy	4	596	April 13, 2022
Scorer for Text Classification usage , textcat , spacy , solved , off-topic	4	1473	July 22, 2020
Prediction or probability score for prediction results using ner model developed by ner.teach usage , ner , spacy	4	1380	May 1, 2020

Customizing NER predictions from Spacy for the Scorer function

Related topics