Imbalanced classes in a multiclass textcat leads to completely biased predictions

clrogers · February 20, 2018, 11:56pm

Matthew,

I really appreciate sharing your NLP expertise. The factors you’ve mentioned have occurred to me and are on my radar, but your specific insight has helped me to think about these issues in a new way.

The reason I haven’t tried to address these data- and problem-definition issues in a wholesale manner is because I’m trying to use SpaCy and Prodigy to quickly build out a proof-of-concept, and I’m doing it on my own time. I feel that my company could benefit from advanced NLP pipelines and models (which I’m relatively proficient at), but it’s not my core responsibility, and without a decent solution, it’s hard to convince leadership to allocate my time to solving these specific problems. I believe SpaCy and Prodigy can dramatically reduce the amount of time it takes me to build out a concept, hence why I’ve invested my time and money in learning them. However, documentation is still a bit raw and examples a bit sparse, and despite reading everything available and studying all the non-compiled code, the next steps just weren’t as obvious to me as they would be if I were using tools I’m more familiar with.

I’m looking forward to those wrappers for tools like Tensorflow, but in the meantime, I appreciate the tip to tinker with the CNN architecture. Time to start digging into the thinc documentation.

Topic		Replies	Views
Best practices & realistic expectations with high number of classes for multiclass text classification task usage , textcat , spacy	2	1142	August 27, 2019
Can't improve textcat model performance textcat	2	389	May 3, 2020
textcat.batch-train usage , textcat	3	1263	August 29, 2018
Multi-class textcat usage , textcat	1	1021	March 27, 2018
Textcat_MultiLable - How doc[cats]=1 or 0 works while training the Model textcat , spacy	3	19	February 14, 2025

Imbalanced classes in a multiclass textcat leads to completely biased predictions

Related topics