multilabel classification

rohitb7 · July 2, 2018, 10:17pm

Hello Team,
Is there any out-of-box recipe to achieve multi-label text classification? Any pointer is appreciated.
Warm regards
Rohit

honnibal · July 3, 2018, 9:02am

Do you mean there are multiple possible labels, but each instance has exactly one, or that there are multiple labels, and each instance may have zero or more?

You can specify multiply labels on the command line, by separating them with commas.

The model also supports non-mutually exclusive labelling. For this I would recommend labelling the instances one-by-one, into different datasets, and then merging the datasets together.

rohitb7 · July 3, 2018, 3:52pm

My question was towards non-mutually exclusive labelling. Thank you so much for the response and for resolving the issue.

ad349 · October 16, 2018, 9:43am

What if, each instance has zero or more label? In my case, the labels are hierarchical, Category > Sub-category > Style > Sub-Style, So how can I label the data only once with multi-labels multi-classes?

honnibal · October 18, 2018, 5:43pm

You could use the choice interface to do this, although I’m not sure it’s really better. You might want to try making multiple passes over the data instead. It sounds worse, but it’s really much faster to annotate each label individually. The decisions are very quick to make, and you don’t have to spend any time in the interface, as you can just click accept or reject. The annotations are also usually more accurate, as the decisions are individually easier.

Mayank · July 6, 2020, 6:42pm

Hi @honnibal

We are building a multi-label (non exclusive) text classification model for which the data has been annotated using Prodigy UI. There're almost 4K examples for 10 labels out which 1000 are positive. Now our questions are:

Is the size (25% are +ve examples) of data set enough to train a model for 10 classes
We've used SpaCy CLI to train where the default metric is ROC AUC score (macro). We want to know how to get the threshold score to classify example to each class.
Is there a separate threshold score for every class? If yes, how can we get that?

Thank you

honnibal · July 13, 2020, 12:33pm

Hi @Mayank,

In general there's no real way to guess how many examples are needed to get to a certain accuracy. You could have one problem where a single word's occurrence linearly separates each class, in which case the model just has to learn that one feature. This problem could be learned from extremely few examples. You can also have other problems where the model will not learn it at all, no matter how much data. So it's not something I can really comment on. The learning curve might help you see whether you need more examples. Probably I would say 1k is a bit too few, you should definitely use word vectors in your model at that data size, and possibly also pretraining.

spaCy doesn't currently support setting a threshold for each score. Instead you can handle this yourself in code that interprets the results. The scores are provided in a dictionary, doc.cats, so you can implement your own way of mapping the scores to positive or negative classifications, based on your cost sensitivity and calibration on your development data.

Topic		Replies	Views
Is textcat.teach (as out-of-the-box) appropriate with multilabel tasks? textcat , solved	4	338	June 28, 2022
Multi-label text classification with many labels usage , textcat	7	2417	June 30, 2020
Multilabel text classification annotation approach usage , textcat , solved	6	1678	November 6, 2018
Appropriate recipes Multilabel Classification usage , textcat	1	801	December 14, 2019
Best practise for multi-label and textcat.teach usage , textcat	6	4853	May 2, 2019

multilabel classification

Related topics