Sorry for the late reply here. For hierarchical text classification we typically recommend doing multiple passes over the dataset for each level of the hierarchy. This discussion thread has a lot of good info on this idea: Two levels of classifications for text classifications - #2 by ines
If you really want to try this async workflow, it is not very straightforward with Prodigy. Internally, Prodigy uses Python generators for the "stream". Prodigy then pulls a batch of examples at a time to be sent out and annotated.
What you want to do requires that you update that generator based on the response from the most recent answer. So in your script you have a few steps missing to accomplish what you want.
The main component is an update
callback that can modify the stream. This would look something like:
(Not: this is pseudo-code)
import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
from prodigy.util import split_string
import spacy
from typing import List, Optional
hierarchy = {'Non_ischemic_cardiomyopathy' : ['dilated_cardiomyopathy', 'sarcoidosis', 'fabry']}
@prodigy.recipe(
"cardiac-classifier",
dataset=("Dataset to save answers to", "positional", None, str),
source = ("Path to annotated examples", "positional", None, str),
label=("One or more comma-separated labels", "option", "l", split_string)
)
def cardiac_classifier(dataset, path_name, label: Optional[List[str]]=None):
nlp = spacy.load("en_core_web_sm")
stream = get_stream(stream)
# stream = add_options(stream)
def update(answers):
assert len(answers) == 1
last_answer = answers[0]
options = hierarchy.get(last_answer["label"])
sub_task = copy.deepcopy(last_answer)
del sub_task["label"]
sub_task["options"] = [{"id": o, "name": o} for o in options]
stream = itertools.chain([sub_task], stream)
# update the model if desired
blocks=[
{"view_id": "text"},
{"view_id": "text_input", "field_label": "Left Ventricular Ejection Fraction (LVEF)"},
{"view_id": "choice"}
]
return{
"dataset": dataset, #needed to save dataset
"stream": stream,
"update": update,
"view_id": "blocks",
"config": {
"blocks": blocks,
"batch_size": 1,
"instant_submit": True
}
Hopefully that helps you in the right direction.