Provided label for custom (classification) recipe doesn't show up, and multiple labels cause error

The minimum reproducible test code is at the bottom. The test data is a collection of integers and annotators are supposed to label whether a number is odd or even. When I run this custom recipe by providing one label (e.g., EVEN), the label does not show up (see code and screenshot below).

E:\Projects>python -m prodigy custom-textcat-manual number_test test_data_numbers.jsonl --label EVEN -F E:\Projects\

When I provide more than one label (e.g., EVEN,ODD), I guess an error (see code and screenshot below).

E:\Projects>python -m prodigy custom-textcat-manual number_test test_data_numbers.jsonl --label EVEN,ODD -F E:\Projects\

I wonder if I have made any mistakes in my setup. Thanks again for all your help!

Custom recipe:

import prodigy
from prodigy.util import log, msg, get_labels, split_string, INPUT_HASH_ATTR
from prodigy.components.loaders import get_stream
from prodigy.components.db import connect, Database, Dataset, Example, Link

    dataset=("Dataset to save answers to", "positional", None, str),
    source=("Data to annotate (file path or '-' to read from standard input)", "positional", None, str),
    label=("Comma-separated label(s) to annotate or text file with one label per line", "option", "l", get_labels),
    view_id=("Annotation interface", "option", "v", str),
    exclusive=("Treat classes as mutually exclusive (if not set, an example can have multiple correct classes)", "flag", "E", bool)
def custom_textcat_manual(dataset, source, label, view_id="text", exclusive=False):
    if not label:"At least one label is required", exits=1)
    labels = label
    has_options = len(labels) > 1
    stream = get_stream(source, rehash = True, dedup = True, input_key="text")

    return {"dataset": dataset, 
            "view_id": "choice" if has_options else "classification", 
            "stream": stream, 
            "config": { "labels": labels,
                        "choice_style": "single" if exclusive else "multiple"}

I think the problem here is that you're not actually adding the single label or the label options to your data. The classification UI expects each record to have some content and a "label", which is then displayed at the top. And if you pass in multiple labels, the interface you're using is choice, which expects each record to have "options", one choice option per label.

That's also what the textcat.manual recipe does under the hood: if there's only one label, it adds a "label" to each example in the stream. If there are multiple labels, it adds "options" with one option per label to the stream.

1 Like

Thanks! Not having the label in the data file was indeed the issue. Now everything is working swimmingly

1 Like