textcat.eval use and Prodi.gy's evaluation workflow?

ines · May 9, 2018, 2:19pm

Thanks a lot, that’s nice to hear!

The textcat.eval recipe (see here for example usage and output) is mostly useful to create an evaluation set in “real time” and see how your model is performing on unseen text.

For example, let’s say you’ve trained or updated a model and you want to see how it performs on new data. You can then use textcat.eval with your model and stream in the texts you want to test it on. The web app lets you click accept/reject on the model’s predictions, and when you exit the server, you’ll see a detailed breakdown of how the model performed, compared to the “correct” answers (i.e. your decisions):

MODEL   USER   COUNT
accept  accept    47   # both you and the model said yes
accept  reject     7   # model said yes, you said no
reject  reject    95   # both you and the model said no
reject  accept     7   # model said no, you said yes 

Correct     142        # total correct predictions
Incorrect    14        # total incorrect predictions

Baseline      0.65     # baseline to beat (score if all answers were the same)
Precision     0.87
Recall        0.87
F-score       0.87

We think the recipe is especially useful as a developer tool, while you’re still working on the model and tweaking it. It may not replace your final evaluation process, but it’s a quick sanity check and a quick way of labelling evaluation data and evaluating the model at the same time.

(I also makes it easy to ask a colleague to do a quick evaluation run for you, if you’re worried that you’re not “strict” enough with your model All they have to do is click a few hundred times, and you’ll immediately have some numbers and at least a rough idea of whether you’re on the right track or not.)

Topic		Replies	Views
Trouble creating evaluation set with textcat.eval usage , textcat , solved	2	899	August 11, 2018
Evaluating a text classification model usage , textcat	4	798	September 24, 2019
Evaluating multi-class classification usage , textcat	5	610	November 20, 2019
Stream textcat.eval from stdin? textcat , done	2	660	April 21, 2019
TextCat Training Results on a per label basis. usage , textcat	1	443	February 18, 2019

textcat.eval use and Prodi.gy's evaluation workflow?

Related topics