No tasks available when trying manual NER with custom labels and custom data stream

I'm looking to do some initial testing of the manual NER interface on documents in my corpus but when I load the web server, it is telling me there are no tasks available. Note: I am running version 1.4.0.

The recipe code is just:

import prodigy
from prodigy.recipes.ner import manual as ner_manual  # import recipe function

@prodigy.recipe('custom-ner.manual')
def custom_ner_manual(dataset, spacy_model, label):
    all_docs = get_all_docs()# list of 10 dicts
    stream = ({'id': row['id'], 'text': row['text']} for row in all_docs)

    # call the ner_manual function with the same arguments as the CLI version
    # this will return a dictionary of components
    components = ner_manual(dataset, spacy_model, source=stream, label=label)
    return components  # return the components by the custom recipe

and I am running with

prodigy custom-ner.manual my_dataset en_core_web_lg MyLabel -F prodigy-recipe.py

where prodigy-recipe.py is a file that contains just the recipe above.

Is there something else I need to do for my intended configuration to work?

Thanks for the report! This is strange, and I don’t see anything about your recipe that might be wrong.

If you run the command with PRODIGY_LOGGING=verbose, Prodigy will log everything that’s happening behind the scenes, and also include content of the batches that are passed around. Does anything here look suspicious? And can you confirm that all tasks that are forwarded to the front-end have the expected format?

(We’re currently working on improving the validation of the incoming tasks on the front-end and server-side, so Prodigy will be able to produce more meaningful warnings.)

Btw, if you only need to plug in your own loader and want to remove one recipe layer of complexity, you can also use a custom script and pipe the dumped JSON forward to the ner.manual recipe. If the source argument of a recipe is not set, it defaults to sys.stdin. So your load_docs.py script could look like this:

all_docs = get_all_docs()
for row in all_docs:
    print(json.dumps({'text': row['text']}))

And you can then run it like this:

load_docs.py | prodigy ner.manual your_dataset en_core_web_lg --label SOME_LABEL

Thanks for the support! When I tried the same documents as input via a pipe, it looks to have worked just fine.

Comparing the console output (non-verbose), with the pipe and your command above see that it with the pipe on, it spits out:
Using 1 labels: SOME_LABEL

I don’t see a similar output using my custom recipe. I do see the label if I print it before passing to ner_manual.

Fwiw: the JSON in the response for get_questions looked the same as far as I could tell between the custom recipe and piped documents version.

The pipe will allow me to experiment for now, though later on I will be adding more customization. Thanks!

Thanks for updating and glad you got it running! :+1:

I think I know what the problem is: When you call the recipe function directly within Python, the conversion of CLI input → value used by Prodigy doesn’t happen, so the string labels are not converted to a list. So you’re passing in "MyLabel", when it should be ["MyLabel"]. Prodigy should probably fail more gracefully here – ideally before it even starts the server.

If you want to use a custom recipe, the simplest solution would be to convert your label argument to a list, e.g. by splitting it on commas and then passing the list forward.

A more elegant solution would be to also add argument annotations to the label argument of your recipe:

@prodigy.recipe('custom-ner.manual', 
    label=prodigy.recipe_args['label_set'])

Prodigy’s argument annotations follow the same style as Plac and are (description, type, shortcut, converter) tuples. The recipe_args include the most common ones to make it easier to reuse them. So under the hood, the argument annotation will look like this:

label=("Label(s) to annotate.", "option", "l", get_labels)

The description shows up when you run the command with --help. -l is the shortcut and get_labels a function that’s called on the string input before it’s passed to your recipe function. This can also be a built-in like bool – or something custom like in this case. get_labels takes the string, checks if it’s a file path and reads labels in from there, and alternatively, splits the string on commas and strips off whitespace.

That looks like it! The simple modification of my recipe to make it a list worked.

Thanks again!

1 Like