Custom objective for textcat

timothyjlaurent · September 11, 2019, 11:53pm

So I'm writing a custom textcat batch train recipe that will persist the model to a cloud data source.

To accomplish this, I copied the textcat.batch_train recipe in the repo and added some calls to our persistence client.

While doing this, I noticed that batch_train uses the model with the highest accuracy as the 'best' model. My dataset is very sparse with passages that should be labeled. Therefor a model could just always predict negative for the label if trying to maximize accuracy -- I've changed my recipe to use fscore instead of accuracy -- are there any downsides to this approach?

Maybe such an option could be included on the batchtrain recipes?

honnibal · September 19, 2019, 10:38pm

Sorry for the delay getting to this --- I missed the thread before somehow.

You're right that the default metric might not be the best choice for all situations. v2.2 of spaCy actually has some improvements in the textcat evaluation that I hope we'll be able to take advantage of in Prodigy.

In the meantime, I think implementing a custom recipe to choose the model under whatever criterion you need should be a good solution.

Topic		Replies	Views
textcat.batch-train usage , textcat	3	1263	August 29, 2018
Additional metrics (recall, precision, accuracy F1) in textcat.train-curve enhancement , textcat	3	1085	January 18, 2023
textcat.batch-train versus spacy classificaion example usage , textcat , spacy	4	543	March 30, 2019
TextCat Training Results on a per label basis. usage , textcat	1	442	February 18, 2019
Can't improve textcat model performance textcat	2	389	May 3, 2020

Custom objective for textcat

Related topics