Can't get labels to be shown.

steffres · May 4, 2020, 6:17pm

I'm trying to set up a multi label text classification but fail at the basics of replicating and modifying recipes from this repo https://github.com/explosion/prodigy-recipes

Firstly: Problems with basic examples

I can't get the a provided recipe running as communicated in the repo. There it says to run (as an example NER):

python -m prodigy -F prodigy-recipes/ner/ner_teach.py

which results in this error:

✘ Can't find recipe or command '-F'.
Run prodigy --help to see available options

Could this be outdated? Since all the other run commands I've seen around prodigy documentation mostly pass further arguments such as datasets, models, sources, etc. Addiationally the method signature (and decorator arguments) in ner_teach.py also indicates the need to pass arguments in the command line.

Secondly: Problems with adapting examples

Now, since I need text classification I've run the textcat_* scripts in the repo without arguments, to no avail, and then with arguments. In order to zero in on error causes I've cut away a lot of code from the original function. After that and after running with respective arguments in the command line a prodigy instance is successfully started up and sample text is correctly loaded, however the label choice form is missing.

The code is this:

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.models.textcat import TextClassifier
from prodigy.util import split_string
import spacy
from typing import List, Optional

@prodigy.recipe(
    "foo_cat",
    dataset=("The dataset to use", "positional", None, str),
    spacy_model=("The base model", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string),
)
def foo_cat(
    dataset: str,
    spacy_model: str,
    source: str,
    label: Optional[List[str]] = None,
):
    nlp = spacy.load(spacy_model)
    model = TextClassifier(nlp, label)
    update = model.update
    stream = JSONL(source)

    return {
        "view_id": "classification",
        "dataset": dataset,
        "stream": stream,
        "update": update,
    }

and the command is this:

prodigy foo_cat some_dataset "de_core_news_sm" news_headlines.jsonl -F prodigy-recipes/textcat/foo_cat.py --label bla,ble,blo

Which runs and results in the web server looking like this:

And here the label choice form is missing. I'd like to have a non-mutually-exclusive form like this here: https://prodi.gy/docs/text-classification#manual

How to do this please?

prodigy stats:

============================== ✨  Prodigy Stats ==============================

Version          1.9.9                         
Location         /home/steff-vm/mara/acdh-prodigy-utils/venv/lib/python3.6/site-packages/prodigy
Prodigy Home     /home/steff-vm/.prodigy       
Platform         Linux-5.3.0-51-generic-x86_64-with-Ubuntu-18.04-bionic
Python Version   3.6.9                         
Database Name    SQLite                        
Database Id      sqlite                        
Total Datasets   9                             
Total Sessions   105

Cheers,
Stefan

ines · May 4, 2020, 6:55pm

Hi! I just had a look at the README and it's true that this example is a bit unfortunate – I think it was supposed to have a ... to indicate the respective settings, but I just updated it to show an example command with example arguments.

Yes, in addition to just the -F pointing to the path, you do have to provide the name of the recipe and the arguments it needs. Otherwise, Prodigy can't know which recipe in the file you want to run (there can be multiple) and how the recipe should be configured. Also see the custom recipes documentation here: Custom Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

For the recipe in ner_teach.py, that could look like this:

prodigy ner.teach your_dataset en_core_web_sm ./data.jsonl --label PERSON -F ./ner_teach.py

The problem here is that you're not actually using the model to process any examples in your stream or to add the labels – the recipe is just using the JSONL loader to load the incoming examples and then returns those. The examples don't have a "label", so it's not shown in the UI.

So how you set up the recipe depends on what you want – if you want the model to apply labels, you probably want something more similar to the textcat.teach example. Or if you just want to show examples with a pre-defined label, you can add it to each outgoing example, e.g. like this:

def get_stream():
    stream = JSONL(source)
    for eg in stream:
        eg["label"] = label
        yield eg

The choice UI uses the choice interface. You can see examples of the data format it expects and the config options here: Annotation interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP

The recipes repo also has an example script using it here: https://github.com/explosion/prodigy-recipes/blob/master/other/choice.py

And the custom recipes docs have an example that shows how to put together a workflow with multiple choice options: Custom Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

steffres · May 25, 2020, 11:08am

Thanks Ines, that did mostly resolve my questions. But now I'm still struggling to get the choice view_id to show the labels als non-mutually exclusive (as it even should be set by default according to the docs).

So, the reduced code is this:

import prodigy

def foo_stream():

    options = [
        {"id": "option_a", "text": "option_a"},
        {"id": "option_b", "text": "option_b"},
        {"id": "option_c", "text": "option_c"},
    ]

    yield {"text": "text 1", "options": options}
    yield {"text": "text 2", "options": options}
    yield {"text": "text 3", "options": options}


@prodigy.recipe(
    "choice",
)
def choice():

    return {
        "view_id": "choice",
        "dataset": "some_dataset",
        "stream": foo_stream(),
        "config": {
            "choice_style": "multiple",
            "choice_auto_accept": True,
        },
    }

Which would display the options correctly, but shows them in a mutually exclusive manner, i.e. only one option can be selected at all times. What is it that I'm doing wrong here?

ines · May 26, 2020, 8:47am

It looks like in your config, you're setting "choice_style": "multiple", which makes the interface accept multiple selections at the same time. If you set it to "single" (or remove that line alltogether), you'll only be able to select one option.

steffres · May 27, 2020, 6:45am

The other way around: It is showing the options as single, i.e. mutually exclusive, but I want it to be multiple, non-mutually exclusive.

This is the browser UI resulting from the code from above, where as I take it from your docs, the round radio buttons mean mutually exclusive options (and of course they do act like that - but that can not be displayed here):

ines · May 27, 2020, 11:13am

Ahh, misread the question, sorry! Check if your global or local prodigy.json overrides the "choice_style" value then and if so, remove that entry. The global and local config lets you overwrite recipe defaults, but if it overrides stuff like "choice_style", this will apply to all recipes and is typically not what you want.

steffres · May 28, 2020, 11:37am

That was it. Thank you Ines!

Topic		Replies	Views
Labels not being served, usage , custom	1	370	February 21, 2020
Textcat correct recipe usage , textcat , solved	1	630	September 16, 2020
Custom multilabel categorization recipe textcat , spacy , front-end , solved	12	6280	August 3, 2020
text classification usage , textcat	7	1126	October 7, 2019
Text Classification Custom Label Issue usage , textcat	5	386	October 27, 2021

Can't get labels to be shown.

Related topics