Customizing ner.correct


I'm looking to build recipes that add functionality to ner.correct (for example pattern matching, parameterizing the DB to save the dataset in, etc.).

Is the protocol to copy and paste the code from here, and then add parameters to the recipe function and key/value pairs in the dictionary that is returned?

I noticed that make_gold is deprecated, and would like a way to customize built in recipes that tracks existing changes. Or even to be able to combine multiple recipes in one running Prodigy instance (for example, a separate recipe that loads a custom DB while I leave the ner.correct recipe untouched).

Is this available now, or am I really just making a feature request?


I wouldn't say deprecated, but we renamed it for consistency :slightly_smiling_face:

Under the hood, Prodigy recipes are just Python functions, so you can also import a recipe function and wrap it by another recipe function. The recipe function returns a dictionary of components – so when you call an existing recipe function in your custom recipe, you get a dictionary of components back that you can return. And before you return it, you can modify it.

Here's a simple (semi-pseudocode) example that shows a few examples.

import prodigy
from import make_gold  # function is still called make_gold
from prodigy.util import  split_string, get_labels

    dataset=("Dataset to save annotations to", "positional", None, str),
    spacy_model=("Loadable spaCy model with an entity recognizer", "positional", None, str),
    source=("Data to annotate (file path or '-' to read from standard input)", "positional", None, str),
    api=("DEPRECATED: API loader to use", "option", "a", str),
    loader=("Loader (guessed from file extension if not set)", "option", "lo", str),
    label=("Comma-separated label(s) to annotate or text file with one label per line", "option", "l", get_labels),
    exclude=("Comma-separated list of dataset IDs whose annotations to exclude", "option", "e", split_string),
    unsegmented=("Don't split sentences", "flag", "U", bool),
    # Add custom recipe CLI arguments
    db_name=("Name of custom DB to load", "option", "db", str),

def custom(
    components = make_gold(dataset, spacy_model, source, api, loader, label, exclude, unsegmented)
    # Overwrite recipe components returned by the recipe and use custom arguments
    components["db"] = LoadMyCustomDB(db_name)
    # Overwrite config settings
    components["config"]["exclude_by"] = "task"
    # Return recipe components
    return components

The source argument passed into the recipe function can also be an already loaded stream generator. So instead of passing the source name forward, your recipe wrapper can load your data however it wants and then pass in the stream, like this:

stream = load_your_custom_stream()
components = make_gold(dataset, spacy_model, stream, api, loader, label, exclude, unsegmented)

Thank you!