I am trying to use the scorer function for F,P,R calculation on the predictions I got from the Spacy model. Using the below code I load the trained spacy model and the test data (rows of text) to predict the labels.
sp_model = spacy.load('/Folder/CustommodeltrainedinSpacy/')
test_data = pd.read_csv('/data.csv')
data_target = data['Text_column'] #Text column
df = pd.DataFrame(columns =['NAME','TAG'],index=range(0,len(data_target)+1))
for i in range(0,len(data_target)):
doc=sp_model(str(data_target[i]))
for colname in df.columns:
for ent in doc.ents:
if colname == ent.label_:
df[colname][i] = ent.text
I have annotated data for 2 custom labels ‘NAME’, ‘TAG’ and trained the model. The above snippet gives me a dataframe with two columns of the tags with appropriate text which was tagged.
NAME TAG
0 John Author
1 Mike Student
Now that I have the predictions from the trained model, how do I evaluate using the scorer function?
import spacy
from spacy.gold import GoldParse
from spacy.scorer import Scorer
def evaluate(ner_model, examples):
scorer = Scorer()
for input_, annot in examples:
doc_gold_text = sp_model.make_doc(input_) #Here I used my trained model
gold = GoldParse(doc_gold_text, entities=annot['entities'])
pred_value = sp_model(input_) #trained model on input
scorer.score(pred_value, gold)
return scorer.scores
My test data is just a column of text. But here the ‘for’ loop has ‘input_,annot’ as variables to loop. In the ‘for loop’ where I do the prediction, I already used the below snippet to create the ‘doc’ element.
doc=sp_model(str(data_target[i]))
Also, is it necessary for me to use the ‘GoldParse’ function to get the scores?