Additional metrics (recall, precision, accuracy F1) in textcat.train-curve

Hi--

I was wondering whether it is possible at all to include (and do you have plans to include) additional performance metrics in the output of textcat.train-curve. Accuracy is not always the most useful when dealing with unbalanced classes (as I am). Are additional metrics in the pipeline for textcat.train-curve, and do you suggest any workarounds in the meantime? (I guess apart from manually splitting up the data in various sizes and then running textcat.batch-train on them).

Cheers!

That's a nice idea! The textcat.train-curve recipe currently uses the number returned by the textcat.batch-train recipe function. If you take a look at the code, you'll see that this is best_acc["accuracy"]. The full stats returned by model.evaluate are the following:

stats = {
    "tp": tp,
    "fp": fp,
    "fn": fn,
    "tn": tn,
    "avg_score": total_score / total,
    "precision": precision,
    "recall": recall,
    "fscore": 2 * ((precision * recall) / (precision + recall + 1e-8)),
    "loss": loss / (len(examples) + 1e-8),
    "accuracy": (tp + tn) / (tp + tn + fp + fn + 1e-8),
    "baseline": baseline,
}

So if you want the train curve recipe to compare the recall instead, the easiest way would be to change the batch-train recipe in recipes/textcat.py and make it return best_acc["recall"].

Btw, you can run the following to find the location of your Prodigy installation:

python -c "import prodigy; print(prodigy.__file__)"