Hierarchical labels

Can you help create a custom recipe with hierarchical labels?

Specifically my hierarchy would look like the following with 3 levels in each category:

Politics
- Asia
-- elections
-- foreign
- Europe
-- local
-- election
-- finance
- America
-- business
-- finance
Sports
- Team
-- football
-- baseball
- Individual
-- golf
-- tennis

This is sample example.jsonl has the following two records to annotate:

{ "text": "Cyber researchers have linked the vulnerability exploited by the latest ransomware to “WannaCry”. Both versions of malicious software rely on weaknesses discovered by the National Security Agency years ago, Kaspersky said." }

{ "text": "Java J****EE Developer ****k k Music, Film & TV London Java JEE Developers required for software house with client sectors of music, film and TV. Salary: Maximum ****: Discretionary bonus and benefits package. Location: Near Euston and King's Cross, London THE COMPANY: Consistent new business wins for the world leader in the provision of software solutions to the Music and Entertainment industry has given rise to the need for an experienced Java Developer. The working environment here is very pleasant with a casual dress code, laid back and friendly atmosphere"}

My recipe file looks as follows:

import prodigy

LABEL_HIERARCHY = {
    "CATEGORY": {
        "Politics": {
            "Asia": {
                "elections":{},
                "foreign":{}
            },
            "Elections": {
                "A":{}, 
                "B":{}
            }
        }
    }
}

def add_label_hierarchy(examples):
    for eg in examples:
        eg["hierarchy"] = LABEL_HIERARCHY
    return examples

@prodigy.recipe('hierarchy_recipe')
def hierarchy_recipe(dataset, source):
    stream = prodigy.get_stream(source)
    stream = add_label_hierarchy(stream)
    return {
        'dataset': dataset,
        'stream': stream,
        'view_id': 'hierarchy_recipe',
        'config': {
            'labels': None
        }
    }

@prodigy.serve('hierarchy_recipe')
def hierarchy_recipe():
    html_template = """
        <div style="font-size: 16px">
            <p>Text: <strong>{{text}}</strong></p>
            <p>Labels:</p>
            <ul>
            {% for level1 in hierarchy %}
                <li>{{ level1 }}
                    {% if hierarchy[level1] %}
                        <ul>
                        {% for level2 in hierarchy[level1] %}
                            <li>{{ level2 }}
                                {% if hierarchy[level1][level2] %}
                                    <ul>
                                    {% for level3 in hierarchy[level1][level2] %}
                                        <li><input type="checkbox" name="{{ level1 }}|{{ level2 }}|{{ level3 }}" value="accept">{{ level3 }}</li>
                                    {% endfor %}
                                    </ul>
                                {% endif %}
                            </li>
                        {% endfor %}
                        </ul>
                    {% endif %}
                </li>
            {% endfor %}
            </ul>
        </div>
    """
    return {'html_template': html_template}

I am running this command: prodigy hierarchy_recipe -F hierarchy_recipe.py test_dataset example.jsonl

Getting this error: usage: prodigy hierarchy_recipe [-h] dataset source

prodigy hierarchy_recipe: error: the following arguments are required: dataset, source

I have tried this recipe as well, but getting error: AttributeError: module 'prodigy.components.printers' has no attribute 'print_total_annotations'

import prodigy
from prodigy.components import printers
from prodigy.util import log

LABELS = {
    "Politics": {
        "Asia": ["elections", "foreign"],
        "Europe": ["local", "election", "finance"],
        "America": ["business", "finance"]
    },
    "Sports": {
        "Team": ["football", "baseball"],
        "Individual": ["golf", "tennis"]
    }
}

@prodigy.recipe("hierarchical_labels",
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str))
def hierarchical_labels(dataset, source):
    def update(examples):
        for eg in examples:
            # add an empty "label" property to each example
            eg["label"] = ""
        return examples

    stream = prodigy.get_stream(source)
    stream = update(stream)
    
    return {
        "dataset": dataset,
        "stream": stream,
        "view_id": "classification",
        "config": {
            "labels": LABELS,
            "choice_style": "multiple",
            "instructions": "Categorize the text into one or more categories"
        },
        "update": update,
        "on_exit": printers.print_total_annotations,
        "log_level": log.WARNING
    }

hi @russel!

Thanks for your question and welcome to the Prodigy community :wave:

You're getting this error because there doesn't exist an attribute named print_total_annotations :slight_smile:

Just curious - did you create these Prodigy recipes from a generative model like ChatGPT? I ask b/c I noticed some weird things with them. For example, some of the config keys don't exist like "log_level", "config": {"labels": ... or are used incorrectly (e.g., "instructions" is a true or false).

We have other posts that suggest the best way to do this is multiple passes (e.g., start with top level, then confirm bottom level).

Let me know if this makes sense.

Thank you Ryan, and yes you are right I tried with the GPT first to get started.

I am trying to make a 3 level-hierarchical list of keywords for the annotators to tag each data stream. When a annotator selects first level keyword, it expands to the next level and so on the front-end.