Best labeling strategy for hierarchical concepts

rickdeshon · March 1, 2019, 3:48pm

Hello.

Imagine you have a hierarchically structured concept space and you want to label text with respect to this space. To be concrete, imagine 4 top level concepts each having 5 sub-concepts for a total of 24 possible labels.

What would be the best approach for labeling text for this concept space?

For instance, would it be more effective to first label text using the 4 (multi)labels at the top of the hierarchy? Or would it be a better labeling practice to perform 4 separate labeling processes each using a binary strategy?

In general, would it be better to perform 24 binary labeling tasks or 5 multiple labeling tasks?

Thanks for your thoughts!

Rick

justindujardin · March 1, 2019, 4:30pm

I would start with binary labels if possible since they're much easier to annotate quickly and accurately. With binary you have to decide "does this match the label, yes/no?" instead of "which of (n) groups does this belong to" This is especially helpful if your concepts might have some overlap or ambiguity that might make it hard to choose one label over the other.

Your example isn't super clear to me, perhaps you can give a more concrete one if you want further thoughts?

ines · March 1, 2019, 6:12pm

There was a similar question the other day and I wrote up some ideas for a workflow that starts with the top-level categories and moves on to the sub categories, using a custom recipe that creates multiple choice tasks automatically:

But as Justin mentioned, you might find that structuring it as a binary task is actually much better and faster, especially for the top-level decisions.

rickdeshon · March 1, 2019, 6:43pm

Thank you both. Very helpful ideas. If I need more, I’ll provide a concrete example.

Best!

Rick

Topic		Replies	Views
Multilabel text classification with more than 200 labels usage , textcat	1	702	January 19, 2022
textcat.teach for multi-class classification textcat	3	515	June 19, 2023
Two levels of classifications for text classifications usage , textcat , custom , front-end	2	863	October 20, 2020
Multi-label text classification with many labels usage , textcat	7	2414	June 30, 2020
hierarchical text classification using spancat and potentially expanding/hiding label subclasses as they come in context textcat , front-end , spancat	6	473	September 21, 2022

Best labeling strategy for hierarchical concepts

Related topics