Help with postprocessing annotated data for training multicategory text classification model

ines · April 17, 2020, 10:47am

Glad it's working!

Once you get to a point where you want to be more specific about how you sample your training and test data (and you're not just running quick experiments to see if you're on the right track), you might want to do that in a separate step, yes.

Prodigy's default --eval-split setting on the train recipes will just hold back a given percentage of the (shuffled) training exaples. That's also how the data-to-spacy recipe does it, if you define a split. The --eval-id on the train recipe lets you pass in the name of a Prodigy dataset that should be used for evaluation. So in theory, you could also use that to provide your own custom evaluation set.

Topic		Replies	Views
textcat_multilabel with only some labels annotated for some examples	5	377	June 14, 2022
What is the input format for annotated multi-label text classification data Getting Started textcat , solved	2	768	July 10, 2020
Custom multilabel categorization recipe textcat , spacy , front-end , solved	12	6278	August 3, 2020
textcat.teach not taking into account label value textcat , done	4	601	December 7, 2018
Prediction model using prodigy trained model runs very slow ner , spacy	5	79	December 26, 2024

Help with postprocessing annotated data for training multicategory text classification model

Related topics