I am performing multi-class text classficiation, and I have clarification. If I have 10 different classes, should I have accept and reject dataset for all of them or because I have multiple class it assumes that if it belongs to one class it’s not part of other class.
Now, If I have to have reject dataset and If I have 10 classes, then for every 1 accept and I need to create 10 reject dataset, one for each class.
It will probably make sense to merge the annotations into one dataset, yes. I think it’ll be best to train spaCy’s text classifier directly for this, instead of using Prodigy’s textcat.batch-train recipe. Once you’ve made your dataset, you can also try out other text classification tools such as scikit-learn, Facebook’s FastText, etc.
To do multi-class classification with spaCy, note that you need to have an entry in the cats dictionary that you pass into GoldParse for every class in your dataset. Missing classes are treated as unknown values. So, let’s say you have one true class “classA” and 2 untrue classes. You’ll have a dictionary like this: