Best way to label taxonomy/hierarchical data

Hi,

I'm dealing with a task of grouping product attributes into a taxonomy with relations of parent, child and so on.
So I have a bunch of product attributes for smartphones like: battery life, display, screen, build quality, sound quality and so on. I'd like to group them into something like

Smartphone
--Screen
----Display
-------Retina Display
-------OLED Display
----Resolution
--------Full HD
--------4K
----Image Quality
--Sound
----Bass
----Treble

What is the best recipe I can use to deal with this? I thought about using Dependencies and Relations, but there are too many aspects, the annotation workflow will become too messy with lots of arrows. Any suggestions?

Thank you!

Hi! I think a lot of it depends on what you're planning on doing with the data later on. Are you going to train a model and if so, what are you going to predict? This also decides whether there's something you can automate, or whether you can restructure the task to make it easier and faster to annotate.

I agree that framing it as a dependencies/relations task is probably overkill here because you're not actually annotating relationships between words in a text, right? If you already know the structured taxonomy, it probably makes sense to also present it this way. One option would be to just add a simple custom HTML UI with some checkboxes – see here for an example: Asking multiple questions in one task (different input types)

Thanks for your answer!

Are you going to train a model and if so, what are you going to predict? This also decides whether there's something you can automate, or whether you can restructure the task to make it easier and faster to annotate.

Yes I am going to train a model, but I won't rely entirely on the model. The model will be a kickstarter for someone to actually finish the taxanomy manually. So if possible, I would use prodigy to both get data to train the model and let the annotator adjust the taxanomy manually. I've been looking at this section of nlp progress for this task (Taxonomy Learning | NLP-progress) and it looks like the prediction is: given a word like "dog", output hypernyms like "canine”, “mammal” or “animal”.

If you already know the structured taxonomy, it probably makes sense to also present it this way.

The structured taxonomy is not known, I only have the set of words. The task is to group these words into the structured taxonomy. There will be several taxonomies, one for each category of product like Smartphone, Laptop, Headphone and so on. Each of these has a unique set of words that are product attributes.