Best labeling strategy for hierarchical concepts


Imagine you have a hierarchically structured concept space and you want to label text with respect to this space. To be concrete, imagine 4 top level concepts each having 5 sub-concepts for a total of 24 possible labels.

What would be the best approach for labeling text for this concept space?

For instance, would it be more effective to first label text using the 4 (multi)labels at the top of the hierarchy? Or would it be a better labeling practice to perform 4 separate labeling processes each using a binary strategy?

In general, would it be better to perform 24 binary labeling tasks or 5 multiple labeling tasks?

Thanks for your thoughts!


I would start with binary labels if possible since they’re much easier to annotate quickly and accurately. With binary you have to decide “does this match the label, yes/no?” instead of “which of (n) groups does this belong to” This is especially helpful if your concepts might have some overlap or ambiguity that might make it hard to choose one label over the other.

Your example isn’t super clear to me, perhaps you can give a more concrete one if you want further thoughts?

1 Like

There was a similar question the other day and I wrote up some ideas for a workflow that starts with the top-level categories and moves on to the sub categories, using a custom recipe that creates multiple choice tasks automatically:

But as Justin mentioned, you might find that structuring it as a binary task is actually much better and faster, especially for the top-level decisions.

Thank you both. Very helpful ideas. If I need more, I’ll provide a concrete example.