Print accuracy in prodigy train textcat

hi @cbjrobertson!

Thanks for your question.

Prodigy's evaluation is using spaCy's scorer (see code). Prodigy's prodigy train is just a wrapper for that. Unfortunately I don't see that available (they offer many of the other, more common evaluation metrics like Precision, Recall, F1 (Micro and Macro) and AUC) but not to my knowledge offer raw accuracy exactly because its distortion effects, especially in imbalanced data.

One idea is if you have trained your own model (e.g., run prodigy train my_model_folder --textcat train_data,eval:eval_data) and your model is now in the folder my_model_folder, you can find the meta.json file that includes the full model scorer including performance by category. If you used a dedicated holdout evaluation dataset (e.g., like eval_data) to calculate your counts by class, you can likely backout the raw accuracy.

Otherwise, you may likely need to write your own custom script. We had a similar request for a confusion matrix but that was for ner, not classification.

Another idea would be to ignore Prodigy/spaCy's evaluation metrics by themselves. After you train a model, take your evaluation dataset, score it on your model, get your predicted probabilities, and then manually set your own thresholds to calculate your own accuracy by hand. This may help convince others as they could see in a spreadsheet the calculations.

I'm sorry that there isn't (to my knowledge) a simpler way of doing this -- but I think spaCy's scorer was designed this way (aka without raw accuracy available) to avoid misinterpreting raw accuracy for multiclass models for reasons you mentioned.

This maybe doesn't answer your question, but you may also find @koaning's spacy-report package for textcat models can be very helpful with different thresholds and visualize their effects on precision/recall by label.

gif (1)