ner.batch-train callback?

Hi, I would like to print out different stats during each batch-train iteration. Is there any way to do this?

Sure! How you do it obviously depends on what you want to output, but The batch-train recipe is a regular Python function, so a good place to start could be to look at how it’s implemented. You can find the location of your Prodigy installation like this:

python -c "import prodigy; print(prodigy.__file__)"

If you check out the batch_train function in recipes/ner.py, you’ll see that on each iteration, the model.evaluate method returns a dictionary of stats. This should look something like this:

{
    'right': 52.0,      # correct entities
    'wrong': 10.0,      # wrong entities
    'unk': 8.0,         # unknown entities
    'ents':  70.0,      # total entities
    'acc': 0.84         # accuracy
}

The batch_train recipe function also returns the stats of the best epoch once training is finished. This lets you call the function from another recipe – for example, similar to the train_curve function. In this case, the recipe is running several batch training sessions and outputting the results.

If you check out the batch_train function in recipes/ner.py, you’ll see that on each iteration, the model.evaluate

Thank you. I tried this and I am still stuck. I copied the batch-train recipe to a new recipe, batch-train2 so that I could replace the call to model.evaluate with my own function. So far so good. What I am specifically trying to do is collect the TP, FP, and FN counts on a per-label basis. This should be easy if I can access the list of annotated spans and the model output. For the former, I used this code:

def gold_to_spacy(examples):
        annotations = []
        for eg in examples:
            entities = [(span['start'], span['end'], span['label'])
                        for span in eg.get('spans', [])]
            annot_entry = [eg['text'], {'entities': entities}]
            annotations.append(annot_entry)
        return annotations

That returns a list of (start, end, label) annotation tuples. But I cant seem to figure out how to get the equivalent information for the results of applying model (EntityRecognizer) to the input text.

I think you just need something like:


texts = [eg['text'] for eg in examples]
for doc in model.nlp.pipe(texts):
    ents = [{'start': span.start_char, 'end': span.end_char, 'label': span.label_} for span in doc.ents]