source argument is not optional (should default to stdin)

(Philipp Dowling) #1

The docs state that the source argument (e.g. for textcat.teach) is optional and defaults to sys.stdin, however this does not work in practice:

> cat my_data.jsonl | python -m prodigy textcat.html.teach -l ACCEPT -F --lo jsonl some_dataset en_core_web_md
usage: prodigy textcat.html.teach [-h] [-a None] [-lo None] [-l] dataset spacy_model source prodigy textcat.html.teach: error: the following arguments are required: source


(Ines Montani) #2

How does your custom textcat.html.teach recipe look? From the error message, it seems like the source argument isn’t optional there.

In Prodigy’s built-in recipes, the source argument should be optional and default to None. The get_stream helper then handles that and loads the source – either from a file with a given loader, or from stdin if it’s None. In a custom recipe, you can choose to do it the same way – or require the argument. That’s up to you.


(Philipp Dowling) #3

Ah yes, looks like I copied the recipe from somewhere where source was indeed not optional by default. My bad. To be honest I forgot that I wasn’t using a built-in recipe - thanks for the quick help!

1 Like