Hierarchal text classification trouble shooting

kab · August 17, 2021, 2:43am

Sorry for the late reply here. For hierarchical text classification we typically recommend doing multiple passes over the dataset for each level of the hierarchy. This discussion thread has a lot of good info on this idea: Two levels of classifications for text classifications - #2 by ines

If you really want to try this async workflow, it is not very straightforward with Prodigy. Internally, Prodigy uses Python generators for the "stream". Prodigy then pulls a batch of examples at a time to be sent out and annotated.

What you want to do requires that you update that generator based on the response from the most recent answer. So in your script you have a few steps missing to accomplish what you want.

The main component is an update callback that can modify the stream. This would look something like:

(Not: this is pseudo-code)

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
from prodigy.util import split_string
import spacy
from typing import List, Optional

hierarchy = {'Non_ischemic_cardiomyopathy' : ['dilated_cardiomyopathy', 'sarcoidosis', 'fabry']}

@prodigy.recipe(
    "cardiac-classifier",
    dataset=("Dataset to save answers to", "positional", None, str),
    source = ("Path to annotated examples", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string)
)
def cardiac_classifier(dataset, path_name, label: Optional[List[str]]=None):
    nlp = spacy.load("en_core_web_sm")
    stream = get_stream(stream)
    # stream = add_options(stream)

    def update(answers):
        assert len(answers) == 1

        last_answer = answers[0]
        options = hierarchy.get(last_answer["label"])
        sub_task = copy.deepcopy(last_answer)
        del sub_task["label"]
        sub_task["options"] = [{"id": o, "name": o} for o in options]
        stream = itertools.chain([sub_task], stream)

        # update the model if desired

    blocks=[
        {"view_id": "text"},
        {"view_id": "text_input", "field_label": "Left Ventricular Ejection Fraction (LVEF)"},
        {"view_id": "choice"}
    ]

    return{
        "dataset": dataset, #needed to save dataset
        "stream": stream,
        "update": update,
        "view_id": "blocks",
        "config": {
            "blocks": blocks,
            "batch_size": 1,
            "instant_submit": True
        }

Hopefully that helps you in the right direction.

Topic		Replies	Views
Custom textcat for 2nd level textcat	5	656	January 23, 2023
Hierarchal text classification process textcat , spacy	2	575	May 17, 2021
Does Prodigy supports hierarchical annotation? usage	8	2195	April 8, 2020
hierarchical text classification using spancat and potentially expanding/hiding label subclasses as they come in context textcat , front-end , spancat	6	473	September 21, 2022
Nested hierarchy for textcat usage , textcat , solved	13	1205	January 26, 2024

Hierarchal text classification trouble shooting

Related topics