Multilabel text classification annotation approach

Hey guys,

I just have a quick question. If I’m training a multiclass text classifier where classes are mutually exclusive, is it better I annotate samples label by label or all at once?

As an example I’ll use a sentiment classifier with positive/negative/neutral labels, should I:

  1. Go through and annotate examples as Positive or not. Then go through the data again and label them as negative. Finally, a third pass through the data to label neutral instances.

  2. Go through one pass with all three labels and simply annotate instances.

Also how do reject examples effect the training process when there are more than two labels. I would have more reject examples in total than accepted examples (assuming classes are balanced). Since classes are mutually exclusive, if an instance is ‘accepted’ as positive should I reject the same instance as ‘negative’ and ‘neutral’ or simply ignore those options and only keep the accepted instances?

Sorry if this sounds confusing, I’m just not sure how the model weighs rejected instances.

Hi Brian,

The short answer to your question is, we’d normally recommend annotating with one label per session, simply because it’s so much more efficient — you get far fewer clicks per annotation, and the clicking time really adds up.

The main caveat with this is if the labels are actually mutually exclusive. In your sentiment example, presumably only one of positive, negative or neutral should apply. In this case you should add a little data manipulation process in between changing labels: if you’ve labelled something positive, you want to go ahead and mark those examples as false for the other two labels (neutral and negative).

On the other hand, not all problems are mutually exclusive. Often you have problems where multiple labels can be true. spaCy’s text categorization model (which Prodigy uses) supports that by default.

Matthew,

Thanks for the response. So essentially for the sentiment example, you’re saying to convert the labels into a one-hot encoding for mutual exclusivity? Does that mean my dataset should not include any reject samples, as if they are accepted as something they are automatically false in the other fields?

So if I am understanding correctly, go through and add all instances to the database as positive. Then have another pass through the data and label negative, etc. I should then ignore any instances that do not belong to the specific label of choice (pos/neg/neu)? I understand that if it were a binary classification problem, it would be simple to label rejected instances, I’m just confused on how to handle them with multi-label classification.

Sorry for the delay getting back to you on this, was travelling last week.

I’m not sure but I think we might be talking past each other here. Some text classification problems are defined so that the labels are mutually exclusive. For instance, emails are either spam or not spam, but never both. Similarly, you might define your sentiment problem to have the labels positive, negative and neutral, and dictate that if a review is positive, it is not neutral.

For other problems, you might want multiple labels to apply. For instance, you might have topic tags like “science” or “politics”, and some articles might have both tags, if they’re about both topics.

If you know that your labels are mutually exclusive, and you’ve already labelled an example as positive, there’s not much point in separately labelling it as not neutral as well. So clicking “accept” on positive should also imply “reject” on neutral. You don’t need to do that work manually — you can just add those reject examples automatically.

Does that make sense?

Yes that makes sense. I’m with you so far as adding those examples in manually.

I was just curious as when running the textcat.teach recipe with three labels, you would see the same instance three times, once for each label. I guess what I am curious about is In this case, would accepting the positive and rejecting the negative and neutral case essentially produce the same outcome as manually adding the reject examples.

Yes, those should produce the same results. It’s actually pretty important for the tool that you can programatically add data, just as though you’d clicked. We wanted to make sure you had full flexibility to automate things.

Awesome! Thanks for the responses, made it much clearer for me.