Multi-class textcat

madhujahagirdar · March 21, 2018, 4:00pm

I am performing multi-class text classficiation, and I have clarification. If I have 10 different classes, should I have accept and reject dataset for all of them or because I have multiple class it assumes that if it belongs to one class it’s not part of other class.

Now, If I have to have reject dataset and If I have 10 classes, then for every 1 accept and I need to create 10 reject dataset, one for each class.

honnibal · March 27, 2018, 3:33pm

It will probably make sense to merge the annotations into one dataset, yes. I think it’ll be best to train spaCy’s text classifier directly for this, instead of using Prodigy’s textcat.batch-train recipe. Once you’ve made your dataset, you can also try out other text classification tools such as scikit-learn, Facebook’s FastText, etc.

To do multi-class classification with spaCy, note that you need to have an entry in the cats dictionary that you pass into GoldParse for every class in your dataset. Missing classes are treated as unknown values. So, let’s say you have one true class “classA” and 2 untrue classes. You’ll have a dictionary like this:

cats = {"classA": 1.0, "classB": 0.0, "classC": 0.0}

Topic		Replies	Views
How to do multiclass textcat? usage , textcat	8	4754	May 25, 2018
Best practices & realistic expectations with high number of classes for multiclass text classification task usage , textcat , spacy	2	1142	August 27, 2019
Multiple, separate text classifications in a single model usage , textcat , solved	12	2886	September 28, 2021
mutually exclusive classes and textcat.batch-train usage , textcat	5	727	July 1, 2019
Imbalanced classes in a multiclass textcat leads to completely biased predictions usage , textcat	7	4018	February 21, 2018

Multi-class textcat

Related topics