That's a nice idea! The textcat.train-curve
recipe currently uses the number returned by the textcat.batch-train
recipe function. If you take a look at the code, you'll see that this is best_acc["accuracy"]
. The full stats returned by model.evaluate
are the following:
stats = {
"tp": tp,
"fp": fp,
"fn": fn,
"tn": tn,
"avg_score": total_score / total,
"precision": precision,
"recall": recall,
"fscore": 2 * ((precision * recall) / (precision + recall + 1e-8)),
"loss": loss / (len(examples) + 1e-8),
"accuracy": (tp + tn) / (tp + tn + fp + fn + 1e-8),
"baseline": baseline,
}
So if you want the train curve recipe to compare the recall instead, the easiest way would be to change the batch-train
recipe in recipes/textcat.py
and make it return best_acc["recall"]
.
Btw, you can run the following to find the location of your Prodigy installation:
python -c "import prodigy; print(prodigy.__file__)"