I noticed that make_gold is deprecated, and would like a way to customize built in recipes that tracks existing changes. Or even to be able to combine multiple recipes in one running Prodigy instance (for example, a separate recipe that loads a custom DB while I leave the ner.correct recipe untouched).
Is this available now, or am I really just making a feature request?
I wouldn't say deprecated, but we renamed it for consistency
Under the hood, Prodigy recipes are just Python functions, so you can also import a recipe function and wrap it by another recipe function. The recipe function returns a dictionary of components – so when you call an existing recipe function in your custom recipe, you get a dictionary of components back that you can return. And before you return it, you can modify it.
Here's a simple (semi-pseudocode) example that shows a few examples.
from prodigy.recipes.ner import make_gold # function is still called make_gold
from prodigy.util import split_string, get_labels
dataset=("Dataset to save annotations to", "positional", None, str),
spacy_model=("Loadable spaCy model with an entity recognizer", "positional", None, str),
source=("Data to annotate (file path or '-' to read from standard input)", "positional", None, str),
api=("DEPRECATED: API loader to use", "option", "a", str),
loader=("Loader (guessed from file extension if not set)", "option", "lo", str),
label=("Comma-separated label(s) to annotate or text file with one label per line", "option", "l", get_labels),
exclude=("Comma-separated list of dataset IDs whose annotations to exclude", "option", "e", split_string),
unsegmented=("Don't split sentences", "flag", "U", bool),
# Add custom recipe CLI arguments
db_name=("Name of custom DB to load", "option", "db", str),
components = make_gold(dataset, spacy_model, source, api, loader, label, exclude, unsegmented)
# Overwrite recipe components returned by the recipe and use custom arguments
components["db"] = LoadMyCustomDB(db_name)
# Overwrite config settings
components["config"]["exclude_by"] = "task"
# Return recipe components
The source argument passed into the recipe function can also be an already loaded stream generator. So instead of passing the source name forward, your recipe wrapper can load your data however it wants and then pass in the stream, like this: