I’m trying to write a custom recipe. I’m copying PLAC annotation from core.py:recipe_args. The type value for the argument is often the string <value is a self-reference, replaced by this string>. Presumably this is some internal placeholder used by Prodigy. How does it work?
In particular I’m asking because I’m trying to take a path to a match patterns file as a positional argument, so I’m copying recipe_args['patterns'] and changing it as necessary. If I leave that string in place I get the error
ValueError: '<value is a self-reference, replaced by this string>' is not callable
Presumably the Prodigy internals replaces this with the appropriate factory function.
Things seem to work if I omit this element from the PLAC tuple. Is there any reason I should keep it in?
Could you post an example of your code or just the recipe decorator and function arguments you’re using?
The recipe_args are Plac argument annotation tuples, to make it easier for Prodigy to reuse them across the individual recipes. So in your recipe, it should look something like this:
from prodigy import recipe_args
@prodigy.recipe('my_recipe'
patterns=recipe_args['patterns'])
def my_recipe(patterns=None):
print(patterns)
Under the hood, recipe_args is a dictionary that contains entries like this:
'patterns': ("Path to match patterns file", "option", "pt", Path)
This makes the argument an option that can be used as --patterns (if your argument is called patterns) or -pt, and the command-line input will be converted from a string to a pathlib.Path. You can also define your own argument annotations or leave them out completely (which means that all command-line arguments are going to be positional and passed in as strings).
@recipe('ner.print-pattern-stream',
spacy_model=recipe_args['spacy_model'],
patterns=(
'Path to match patterns file',
'positional',
None,
'<value is a self-reference, replaced by this string>'),
source=recipe_args['source'],
api=recipe_args['api'],
loader=recipe_args['loader'])
def print_pattern_stream(spacy_model, patterns, source=None, api=None, loader=None):
...etc...
I copied the patterns tuple from recipe_args['patterns'], changing 'option' to 'positional' and removing the short argument name 'pt'.
If I change this to just patterns=('Path to match patterns file', 'positional') everything works fine.
The idea for the recipe_args is that they can be used to replace the tuples (see my example above) – not copied. So instead of the tuple, you can just use recipe_args['patterns'] in your recipes. This is the original tuple:
("Path to match patterns file", "option", "pt", Path)
In the compiled Python source, the Path reference (to pathlib.Path) seems to get replaced with that '<value is a self-reference, replaced by this string>' string. So if you just copy that output instead of using recipe_args['patterns'] in your script, you’ll end up with this string instead of the actual Path class.
The fourth argument of the tuple is the argument type or a converter function. When the argument is passed in from the command-line, this will be called on the argument value. This also explains why you see this error – you’re passing in a string, not a callable. Argument types can be built-ins like str, int or bool, but also other callables and functions. So in this case, patterns.jsonl → Path('patterns.jsonl').
I copied and modified recipe_args instead of just dereferencing it because in my standalone recipe the patterns file should be a positional rather than an optional argument. This probably wouldn’t be the case if this functionality was incorporated into the ner.print-stream recipe.
I didn’t know about the Path converter function. Here I think I don’t need it because I end up writing
model = PatternMatcher(spacy.load(spacy_model)).from_disk(patterns)
Yes, this makes sense. I think in the long run, you’ll probably build up your own set of arguments annotations for your custom recipes that cover your needs and preferences. There’s actually quite a lot of cool stuff you can do with those (the Plac documentation has some more examples as well).
The PRODIGY_README.html also includes a list of the most important, reusable recipe_args and their annotations, including the type or converter function.