textcat: 2-level hierarchical classification

Hi, I currently have a custom recipe which uses blocks to display a chunk of text, a list of options to choose, and then at the bottom a comment section. For the options, I have a set of 2-level categories (6 top-level categories, and each has 3/4 subcategories). After reading https://prodi.gy/docs/text-classification#large-label-sets and a few related posts, I still have not found a good solution for achieving this. The multi-pass approach is not good for my task, and I would rather the annotators select the top-level categories (second-level hidden). Then the second-level sub-categories for the selected category show up either in the drop-down fashion or simply as a list of choices displayed horizontally. For example, the multi-level classification annotation shown in Label Studio — Data Labeling & Annotation Tool Interactive Demo

Would really appreciate to hear if any suggestions or any related examples. Thanks.

Or is it possible to have two blocks on the same page that the top block displays options for the first level categories to choose, and then once a category is chosen, its sub-categories appear in the bottom block?

hi @blue482!

Thanks for your question! This is a good question.

First, it is important to note that (see the docs):

Multiple blocks of the same type are only supported for the text_input and html interface. For text input blocks, you can use the "field_id" to assign unique names to your fields and make sure they all write the user input to different keys. For HTML blocks, you can use different "html_template" values that reference different keys of your task.

[
  {"view_id": "text_input", "field_id": "user_input_a", "field_rows": 1},
  {"view_id": "text_input", "field_id": "user_input_b", "field_rows": 3},
  {"view_id": "html", "html_template": "<strong>{{content_a}}</strong>"},
  {"view_id": "html", "html_template": "<em>{{content_b}}</em>"}
]

So blocks of the same type (i.e., choice / categories), can't be used.

However, this post outlines how to create a simple HTML template (which can be duplicated in blocks) with a checkbox and a small JavaScript script that listens to the checked event of the box and updates the task with whether it's checked.

If you use multiple check boxes, I think it's possible to do something very similar to the multi-level example you showed.

There's an example recipe in that same chain that can get you started:

Can you take a try?

We'd be interested if you're able to make any progress!

Many thanks for this! I have managed to make a second checkbox show up below the 1st block though currently the selection of this second checkbox is not being recorded in the database yet... long way to go.

Can I please ask if there is any way to display the 2nd block (i.e. options for the sub-categories) based on the selection for the 1st block (i.e. options for the main categories). So like conditional choices. Because I have different sub-categories for each category.

Does this help?

CleanShot 2023-04-11 at 11.59.23

1 Like

Many thanks for the response @ryanwesslen !

I have tried incorporating the example codes into my task:
1), adding the code below as custom.js

function toggle(id) {
    var x = document.getElementById(id);
    if (id == "a"){
        reset("b")
    }else{
        reset("a")
    }
    if (x.style.display === "none") {
    x.style.display = "block";
    } else {
    x.style.display = "none";
    }
}

function reset(id){
    var x = document.getElementById(id);
    x.style.display = "none"
    var checkboxes = document.getElementsByClassName("checkbox");
    for(let elem in checkboxes){
        checkboxes[elem].checked = false;
    }
}

function update(){
    var checkboxes = document.getElementsByClassName("checkbox");

    var results = [];
    for(let elem in checkboxes){
        if(checkboxes[elem].checked){
            results.push(checkboxes[elem].id)
        }
    }
    prodigy.update({
        selected: results
    })
}

document.addEventListener('prodigyanswer', event => {
    reset("a")
    reset("b")
})

2), adding the code below as template.jinja2

<button onclick="toggle('a')">Option A</button>
<div id="a" style="display: none;">
    <form style="display: block;">
    {%- for reason in options["a"] -%}
        <input type="checkbox" class="checkbox" id="{{reason}}" name="{{reason}}" onchange="update()" style="margin: 0.4rem;"><label for="{{reason}}">{{reason}}</label><br>
    {%- endfor -%}
    </form>
</div>
<button onclick="toggle('b')">Option B</button>
<div id="b" style="display: none;">
<form style="display: block;">
    {%- for reason in options["b"] -%}
        <input type="checkbox" class="checkbox" id="{{reason}}" name="{{reason}}" onchange="update()" style="margin: 0.4rem;"><label for="{{reason}}">{{reason}}</label><br>
    {%- endfor -%}
    </form>
</div>

3), and my prodigy recipe is as simple as it can gets:

import jinja2
from typing import Union
from pathlib import Path

import prodigy
from prodigy.util import msg
from prodigy import set_hashes
from prodigy.components.loaders import JSONL


@prodigy.recipe(
        "sdoh-test",
        dataset=("The dataset to save to", "positional", None, str),
        file_in=("Path to texts", "positional", None, str),
)
def textcat_w_comments(dataset, file_in):

    options = [
        {
            "a": [
                "sub-option a1",
                "sub-option a2"
            ] 
        },
        {
            "b": [
                "sub-option b1",
                "sub-option b2"
            ]
        },
    ]

    def add_template(stream):
        for ex in stream:
            ex['html'] = template.render(options=options)
            yield set_hashes(ex)

    def before_db(examples):
        for ex in examples:
            del ex['html']
        print(examples)
        return examples

    template = load_template("template.jinja2")
    custom_js = Path("custom.js").read_text() 

    stream = JSONL(file_in)

    blocks = [
        {"view_id": "text"},
        {"view_id": "html"},
    ]

    return {
        "view_id": "blocks",         
        "dataset": dataset,         
        "stream": add_template(stream),            
        "config": {
            "blocks": blocks,      
            "javascript": custom_js,
        },
        "before_db": before_db
    }


def load_template(path: Union[str, Path]) -> jinja2.Template:
    if not isinstance(path, Path):
        path = Path(path)
    if not path.suffix == ".jinja2":
        msg.fail(
            "Must supply jinja2 file.",
            exits=1,
        )
    with path.open("r", encoding="utf8") as file_:
        text = file_.read()
    return jinja2.Template(text, undefined=jinja2.DebugUndefined)

The result is the two options (A and B) are displayed on the interface as well as the text to annotate, but, there is no response when I click on the options. the sub options/categories do not appear at all. It's strange since my code should be very similar if the same as the example code you kindly referred me to. I am wondering if there is anything obvious I am missing...

Great work!

Looks like you just had the wrong format for options. In your recipe, change this:

    options = [
        {
            "a": [
                "sub-option a1",
                "sub-option a2"
            ] 
        },
        {
            "b": [
                "sub-option b1",
                "sub-option b2"
            ]
        },
    ]

to

    options = {
        "a": [
            "sub-option a1",
            "sub-option a2"
        ],
        "b": [
            "sub-option b1",
            "sub-option b2"
        ]
    }

One thing I noticed about this - it doesn't save the top level to the data. It does save the bottom in selected.

[
   {
      "text":"Uber’s Lesson: Silicon Valley’s Start-Up Machine Needs Fixing",
      "meta":{
         "source":"The New York Times"
      },
      "_input_hash":-1857271317,
      "_task_hash":402291892,
      "_view_id":"blocks",
      "selected":[
         "sub-option a1"
      ],
      "answer":"accept",
      "_timestamp":1687462245,
      "_annotator_id":"2023-06-22_15-30-38",
      "_session_id":"2023-06-22_15-30-38"
   }

Thinking you could add in the before_db a mapping so that it adds the corresponding top level from options. Also, there's definitely opportunity to generalize this recipe/scripts so that the user inputs the options as a .jsonl file.

Hope this helps!

2 Likes

Many thanks @ryanwesslen !!

I have changed it to

    options = {
        "a": [
            "sub-option a1",
            "sub-option a2"
        ],
        "b": [
            "sub-option b1",
            "sub-option b2"
        ]
    }

But still nothing happens when I click on the "Option A" or "Option B" button.

I'm wondering if you may have forgotten to update either the custom.js or template.jinja2.

I created a reproducible example here:

Note that I generalized it a bit by passing the labels as a labels.jsonl file within the command.

Can you try this? It seemed to work fine for me.

textcat-hierarchical
i

1 Like

Thanks very much for the detailed response @ryanwesslen !

That's strange! I think I have twisted my code now and it should look exactly like yours. Mine: Prodigy hierarchical text classification (testing) · GitHub

But the result is the same as before, with no response whatsoever when I click on the options... I dunno what is missing...

hi @blue482,

Thanks for the update. Yes, the code looks the same and I didn't have a problem with your code running locally. This leads me to believe it may be a browser issue locally on your computer.

Just curious - what browser are you using? If Chrome, can you right click and then select "Inspect" and see if you have any JavaScript errors?

Also, just to check, what version of Prodigy are you using?

You can run prodigy stats and provide that output. I don't think that would be the problem unless you're running a very old version of Prodigy. But worth checking.

Just curious - what browser are you using? If Chrome, can you right click and then select "Inspect" and see if you have any JavaScript errors?

I am using Chrome (have tested on Edge as well), and I get the error below:
"Uncaught ReferenceError: toggle is not defined at HTMLButtonElement.onclick (?session=bo:1:1)"

Also, just to check, what version of Prodigy are you using?

Prodigy version == 1.11.14
SpaCy version == 3.5.3

python -m prodigy stats

============================== ✨  Prodigy Stats ==============================

Version          1.11.14
Location         D:\bo\envs\sdoh_anno\lib\site-packages\prodigy
Prodigy Home     C:\Users\BW720\.prodigy
Platform         Windows-10-10.0.19041-SP0
Python Version   3.7.13
Database Name    SQLite
Database Id      sqlite
Total Datasets   3
Total Sessions   53

HI @blue482,

Thanks! One possibility - did you set "javascript" in either your global prodigy.json (i.e., in C:\Users\BW720\.prodigy) or locally (i.e., in the same folder)? When I have something (e.g., "javascript": ""), it'll overwrite the javascript provided in custom.js, which will then lead to a similar problem (i.e., I can't toggle the buttons).

If this were the case, you should see this warning in CLI making you aware of this problem:

⚠ Config setting 'javascript' defined in recipe is overwritten by a
different value set in the global or local prodigy.json. This may lead to
unexpected results and potentially changes to the core behavior of the recipe.
If that's surprising, you should probably remove the setting 'javascript' from
your prodigy.json.

Also - just curious, what happens if you try on Python 3.8 or higher? I'm running on Python 3.9. I can't directly see why this would matter, but I'm trying to think outside the box for any possibilities.

1 Like

Thanks! One possibility - did you set "javascript" in either your global prodigy.json (i.e., in C:\Users\BW720\.prodigy ) or locally (i.e., in the same folder)? When I have something (e.g., "javascript": "" ), it'll overwrite the javascript provided in custom.js , which will then lead to a similar problem (i.e., I can't toggle the buttons).

Aha yeah that's the problem I think! I have the prodigy.json file under the same directory as my recipe and other files, and I have "javascript": null . I can now see the sub-options after removing "javascript": null. Great! I will check out if appropriate data are stored.

Thanks again for helping out @ryanwesslen !

Would you please give an example of adding top-level selection in the annotation results using before_db? I couldn't figure it out unless changing the javascript file. What do you mean by

generalize this recipe/scripts so that the user inputs the options as a .jsonl file

?

I updated my gist so now it should add the top level to the saved annotation data:

In the recipe.py, I added this function:

def get_upper(input_strings, dictionary):
    if input_strings is None: # if empty, return empty list
        return list()
    matching_keys = []
    for key, value in dictionary.items():
        if isinstance(value, list) and any(elem in value for elem in input_strings):
            matching_keys.append(key)
    return matching_keys

Then I updated the before_db() function, I added calling this function and assigning it to a new field, upper_selected:

def before_db(examples):
    for ex in examples:
        del ex['html']
        ex['upper_selected'] = get_upper(ex.get('selected'), options) # added
    return examples

Now, the data will have an extra field for the upper (top-level) data.

{
  "text": "Uber’s Lesson: Silicon Valley’s Start-Up Machine Needs Fixing",
  "meta": {
    "source": "The New York Times"
  },
  "_input_hash": 1886699658,
  "_task_hash": -1952856502,
  "_view_id": "blocks",
  "selected": [
    "sub-option a1",
    "sub-option a2"
  ],
  "answer": "accept",
  "_timestamp": 1688585735,
  "_annotator_id": "2023-07-05_15-35-30",
  "_session_id": "2023-07-05_15-35-30",
  "upper_selected": [
    "a"
  ]
}

You can ignore this -- I already implemented it into my gist, i.e., options is now an argument of that recipe.

There are likely other ways you can improve or generalize this function, but hopefully this will do the trick!