Hierarchal text classification trouble shooting

yjwang93 · August 13, 2021, 10:05pm

Hi there,

I apologize if this is pretty straight forward question but I am still relatively new at python. I am trying to create a hierarchal classification system instead of doing iterations of classification. There seems to be a problem with the term "eg['accept'].

kab · August 13, 2021, 10:10pm

Hi! Could you please include your code inline instead of as a screenshot?

If you use 3 backticks (`) at the start and end, this forum can format it for you. (Or you could click the </> icon in the toolbar)

That will allow us to help you better.

yjwang93 · August 13, 2021, 10:29pm

Thank you so much! Let me try this out! I realize some more context for what I am trying to do might be helpful. I am trying to feed in a JSONL line and select a few choices (X, Y, Z). Each of these options would have a subclass that I could further pick from.

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
from prodigy.util import split_string
import spacy
from typing import List, Optional


# def get_stream(examples, hierarchy):
#     for eg in stream: 
#         top_labels = eg['accepted']
#         for labels in top_labels:
#             sub_labels = hierarchy[labels]
#             options = [{'id': opt, 'text': opt} for opt in sub_labels]
#             # create new example with text and sub labels as options
#             new_eg = {'text': eg['text'], 'options': options}
#             yield new_eg

hierarchy = {'Non_ischemic_cardiomyopathy' : ['dilated_cardiomyopathy', 'sarcoidosis', 'fabry']}

@prodigy.recipe(
    "cardiac-classifier",
    dataset=("Dataset to save answers to", "positional", None, str),
    path_name = ("Path to annotated examples", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string))

def cardiac_classifier(dataset, path_name, label: Optional[List[str]]=None):
    nlp = spacy.load("en_core_web_sm")
    stream = JSONL(path_name)
    stream = get_stream(stream)
    stream = add_options(stream)


    blocks=[
        {"view_id": "text"},
        {"view_id": "text_input", "field_label": "Left Ventricular Ejection Fraction (LVEF)"},
        {"view_id": "choice"}
    ]

    return{
        "dataset": dataset, #needed to save dataset
        "stream": stream,
        "view_id": "blocks",
        "config": {
            "blocks": blocks}
            # "label": hierarchy}
       

    }

def get_stream(stream):
    for eg in stream:
        top_labels = eg['text']
        for label in top_labels:
            sub_labels = hierarchy[label]
            options = [{'id':opt, 'text': opt} for opt in sub_labels]
            new_eg = {'text': eg['text'], 'options': options}
            yield eg

def add_options(stream):
    options = [
        {"id": "dilated_cardiomyopathy", "text": "dilated_cardiomyopathy"},
        {"id": "sarcoidosis", "text": "sarcoidosis"},
        {"id": "fabry", "text": "fabry"}
    ]
    for task in stream:
        task["options"] = options
        yield task

kab · August 17, 2021, 2:43am

Sorry for the late reply here. For hierarchical text classification we typically recommend doing multiple passes over the dataset for each level of the hierarchy. This discussion thread has a lot of good info on this idea: Two levels of classifications for text classifications - #2 by ines

If you really want to try this async workflow, it is not very straightforward with Prodigy. Internally, Prodigy uses Python generators for the "stream". Prodigy then pulls a batch of examples at a time to be sent out and annotated.

What you want to do requires that you update that generator based on the response from the most recent answer. So in your script you have a few steps missing to accomplish what you want.

The main component is an update callback that can modify the stream. This would look something like:

(Not: this is pseudo-code)

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
from prodigy.util import split_string
import spacy
from typing import List, Optional

hierarchy = {'Non_ischemic_cardiomyopathy' : ['dilated_cardiomyopathy', 'sarcoidosis', 'fabry']}

@prodigy.recipe(
    "cardiac-classifier",
    dataset=("Dataset to save answers to", "positional", None, str),
    source = ("Path to annotated examples", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string)
)
def cardiac_classifier(dataset, path_name, label: Optional[List[str]]=None):
    nlp = spacy.load("en_core_web_sm")
    stream = get_stream(stream)
    # stream = add_options(stream)

    def update(answers):
        assert len(answers) == 1

        last_answer = answers[0]
        options = hierarchy.get(last_answer["label"])
        sub_task = copy.deepcopy(last_answer)
        del sub_task["label"]
        sub_task["options"] = [{"id": o, "name": o} for o in options]
        stream = itertools.chain([sub_task], stream)

        # update the model if desired

    blocks=[
        {"view_id": "text"},
        {"view_id": "text_input", "field_label": "Left Ventricular Ejection Fraction (LVEF)"},
        {"view_id": "choice"}
    ]

    return{
        "dataset": dataset, #needed to save dataset
        "stream": stream,
        "update": update,
        "view_id": "blocks",
        "config": {
            "blocks": blocks,
            "batch_size": 1,
            "instant_submit": True
        }

Hopefully that helps you in the right direction.

yjwang93 · August 17, 2021, 9:36pm

Thank you for the response @kab !

I think we might be going in a different direction of either a dropdown list for the labels that we want to specify. Do you know if this is supported using the choice interface?

Alternatively, is it possible to just create headings for each category in the choice interface?

For example:

Heart attack:

mild
moderate
severe

Where heart attack would not be an option but mild, moderate, and severe would be.

kab · August 17, 2021, 9:41pm

Dropdown lists aren't supported out of the box (although you could build a custom HTML view for this)

The heading thing wouldn't work without some more custom HTML but you could always provide options like:

{"id": "heart-attack:mild": "text": "Heart Attack: Mild"}
{"id": "heart-attack:moderate": "text": "Heart Attack: Moderate"}
{"id": "heart-attack:severe": "text": "Heart Attack: Severe"},
...

Topic		Replies	Views
Custom textcat for 2nd level textcat	5	652	January 23, 2023
Hierarchal text classification process textcat , spacy	2	566	May 17, 2021
Does Prodigy supports hierarchical annotation? usage	8	2192	April 8, 2020
hierarchical text classification using spancat and potentially expanding/hiding label subclasses as they come in context textcat , front-end , spancat	6	473	September 21, 2022
Nested hierarchy for textcat usage , textcat , solved	13	1199	January 26, 2024

Hierarchal text classification trouble shooting

Related topics