Hi, I would like to print out different stats during each batch-train iteration. Is there any way to do this?
Sure! How you do it obviously depends on what you want to output, but The batch-train
recipe is a regular Python function, so a good place to start could be to look at how it’s implemented. You can find the location of your Prodigy installation like this:
python -c "import prodigy; print(prodigy.__file__)"
If you check out the batch_train
function in recipes/ner.py
, you’ll see that on each iteration, the model.evaluate
method returns a dictionary of stats. This should look something like this:
{
'right': 52.0, # correct entities
'wrong': 10.0, # wrong entities
'unk': 8.0, # unknown entities
'ents': 70.0, # total entities
'acc': 0.84 # accuracy
}
The batch_train
recipe function also returns the stats of the best epoch once training is finished. This lets you call the function from another recipe – for example, similar to the train_curve
function. In this case, the recipe is running several batch training sessions and outputting the results.
If you check out the batch_train function in recipes/ner.py, you’ll see that on each iteration, the model.evaluate
Thank you. I tried this and I am still stuck. I copied the batch-train recipe to a new recipe, batch-train2
so that I could replace the call to model.evaluate
with my own function. So far so good. What I am specifically trying to do is collect the TP, FP, and FN counts on a per-label basis. This should be easy if I can access the list of annotated spans and the model output. For the former, I used this code:
def gold_to_spacy(examples):
annotations = []
for eg in examples:
entities = [(span['start'], span['end'], span['label'])
for span in eg.get('spans', [])]
annot_entry = [eg['text'], {'entities': entities}]
annotations.append(annot_entry)
return annotations
That returns a list of (start, end, label)
annotation tuples. But I cant seem to figure out how to get the equivalent information for the results of applying model (EntityRecognizer
) to the input text.
I think you just need something like:
texts = [eg['text'] for eg in examples]
for doc in model.nlp.pipe(texts):
ents = [{'start': span.start_char, 'end': span.end_char, 'label': span.label_} for span in doc.ents]