Best way to label taxonomy/hierarchical data

JAugusto97 · September 18, 2020, 8:24pm

Hi,

I'm dealing with a task of grouping product attributes into a taxonomy with relations of parent, child and so on.
So I have a bunch of product attributes for smartphones like: battery life, display, screen, build quality, sound quality and so on. I'd like to group them into something like

Smartphone
--Screen
----Display
-------Retina Display
-------OLED Display
----Resolution
--------Full HD
--------4K
----Image Quality
--Sound
----Bass
----Treble

What is the best recipe I can use to deal with this? I thought about using Dependencies and Relations, but there are too many aspects, the annotation workflow will become too messy with lots of arrows. Any suggestions?

Thank you!

ines · September 19, 2020, 10:06am

Hi! I think a lot of it depends on what you're planning on doing with the data later on. Are you going to train a model and if so, what are you going to predict? This also decides whether there's something you can automate, or whether you can restructure the task to make it easier and faster to annotate.

I agree that framing it as a dependencies/relations task is probably overkill here because you're not actually annotating relationships between words in a text, right? If you already know the structured taxonomy, it probably makes sense to also present it this way. One option would be to just add a simple custom HTML UI with some checkboxes – see here for an example: Asking multiple questions in one task (different input types)

JAugusto97 · September 20, 2020, 3:55pm

Thanks for your answer!

Are you going to train a model and if so, what are you going to predict? This also decides whether there's something you can automate, or whether you can restructure the task to make it easier and faster to annotate.

Yes I am going to train a model, but I won't rely entirely on the model. The model will be a kickstarter for someone to actually finish the taxanomy manually. So if possible, I would use prodigy to both get data to train the model and let the annotator adjust the taxanomy manually. I've been looking at this section of nlp progress for this task (Taxonomy Learning | NLP-progress) and it looks like the prediction is: given a word like "dog", output hypernyms like "canine”, “mammal” or “animal”.

If you already know the structured taxonomy, it probably makes sense to also present it this way.

The structured taxonomy is not known, I only have the set of words. The task is to group these words into the structured taxonomy. There will be several taxonomies, one for each category of product like Smartphone, Laptop, Headphone and so on. Each of these has a unique set of words that are product attributes.

Topic		Replies	Views
Determining the best annotation pipeline for our scenario usage , ner , best-practices	5	1013	April 29, 2019
hierarchical text classification using spancat and potentially expanding/hiding label subclasses as they come in context textcat , front-end , spancat	6	473	September 21, 2022
constituency parsing, dependency parsing and semantic role labeling usage , dep	9	1009	May 10, 2020
Manual text typing usage , custom	2	932	February 25, 2018
Best labeling strategy for hierarchical concepts usage , textcat	3	888	March 1, 2019

Best way to label taxonomy/hierarchical data

Related topics