Two levels of classifications for text classifications

Hi there,

I'm looking for the best way to give two levels of classifications to each sentence, using the Text Classification (or similar) recipe. What I want to achieve is that I end up with a dataset where each sentence has a main category label and then a subcategory label.

As far as I can see, I can use Prodigy only to first create a dataset with the main category labels, and have to use that same set to label the same sentences with a subcategory. Are there alternative/better ways to achieve my goal?

Hi! This is definitely a solution and something we often recommend for hierarchical label schemes. You can read more about the idea and reasoning here: https://prodi.gy/docs/text-classification#large-label-sets It does mean you have to make a second pass over the data, but you'll be able to focus on only the subcategories in the second pass (which can be much more efficient because it's easier to focus) or use automation specific to the top-level categories (to pre-select labels). And it helps while you're developing, because it makes it easier to iterate.

Alternatively, you could also just generate a list of "options" with text like CATEGORY > SUBCATEGORY and list all possibilities in the same task.

Finally, you could also do something more custom and add your own checkboxes/radio buttons with an "accordion"-type UI that pops out additional options, depending on what you select and how you want the hierarchy to work. See here for details: Adding Accordion to Choices

Thanks for your explanation! I'll look into it, but the 'focus' argument really makes me prefer to go for a second pass instead of doing two layers of annotation simultaneously.