source argument is not optional (should default to stdin)

The docs state that the source argument (e.g. for textcat.teach) is optional and defaults to sys.stdin, however this does not work in practice:

> cat my_data.jsonl | python -m prodigy textcat.html.teach -l ACCEPT -F textcat_html.py --lo jsonl some_dataset en_core_web_md
usage: prodigy textcat.html.teach [-h] [-a None] [-lo None] [-l] dataset spacy_model source prodigy textcat.html.teach: error: the following arguments are required: source

How does your custom textcat.html.teach recipe look? From the error message, it seems like the source argument isn’t optional there.

In Prodigy’s built-in recipes, the source argument should be optional and default to None. The get_stream helper then handles that and loads the source – either from a file with a given loader, or from stdin if it’s None. In a custom recipe, you can choose to do it the same way – or require the argument. That’s up to you.

Ah yes, looks like I copied the recipe from somewhere where source was indeed not optional by default. My bad. To be honest I forgot that I wasn’t using a built-in recipe - thanks for the quick help!

1 Like