Help with textcat workflow

ines · August 13, 2021, 3:14am

This is definitely a good question and it really depends on the data and the label distribution – if you have lots of labels, including some that are rare, you usually want a larger evaluation set to make sure you have all labels covered. If the set is too small, your results will also become harder to interpret: if you're only evaluating on a small number of examples, even one or two individual predictions can easily make up for a few percent in accuracy difference.

In the beginning, aiming for an evaluation set of about the same size as your training set might be a good approach. So you could train on 300 examples and evaluate on 300. Once you're satisfied with your evaluation set, you can then keep it stable and train on 800, 1000, 1200 examples using the same 800 evaluation examples.

Topic		Replies	Views
Custom multilabel categorization recipe textcat , spacy , front-end , solved	12	6278	August 3, 2020
Text classification with multiple exclusive labels and unbalanced classes usage , textcat	3	635	June 3, 2020
Multi label tagging usage , textcat	1	1180	September 10, 2018
E895 when training with textcat.manual --exclusive enhancement , usage , textcat , done , solved	5	783	September 8, 2021
Textcat model with multiple classes usage , textcat	5	1536	November 1, 2019

Help with textcat workflow

Related topics