Thanks a lot, that’s nice to hear!
The textcat.eval
recipe (see here for example usage and output) is mostly useful to create an evaluation set in “real time” and see how your model is performing on unseen text.
For example, let’s say you’ve trained or updated a model and you want to see how it performs on new data. You can then use textcat.eval
with your model and stream in the texts you want to test it on. The web app lets you click accept/reject on the model’s predictions, and when you exit the server, you’ll see a detailed breakdown of how the model performed, compared to the “correct” answers (i.e. your decisions):
MODEL USER COUNT
accept accept 47 # both you and the model said yes
accept reject 7 # model said yes, you said no
reject reject 95 # both you and the model said no
reject accept 7 # model said no, you said yes
Correct 142 # total correct predictions
Incorrect 14 # total incorrect predictions
Baseline 0.65 # baseline to beat (score if all answers were the same)
Precision 0.87
Recall 0.87
F-score 0.87
We think the recipe is especially useful as a developer tool, while you’re still working on the model and tweaking it. It may not replace your final evaluation process, but it’s a quick sanity check and a quick way of labelling evaluation data and evaluating the model at the same time.
(I also makes it easy to ask a colleague to do a quick evaluation run for you, if you’re worried that you’re not “strict” enough with your model All they have to do is click a few hundred times, and you’ll immediately have some numbers and at least a rough idea of whether you’re on the right track or not.)