a question about custom recipe


I am working on customizing my recipes following the page "https://prodi.gy/docs/custom-recipes".

For the example " Example: Custom interfaces with choice, manual NER, free-form input and custom API loader", Could you please let me know how to create "cat_facts_data"?


cat_facts_data in the example command is the first argument of the recipe, so the name of the dataset that the annotations will be saved to. This is is a name you can choose, and when you annotate, Prodigy will create this dataset in your database and your annotations will be saved to it. You can then export your annotations using the db-out command with the name of your dataset, i.e. cat_facts_data.

Thank you for your reply.

If I have raw data and patterns, how to automatically create a dataset like one that is saved by the annotation process?


Check out these threads for details:

Thank you very much for your reply.

I am working on the recipe "prodigy-recipes/ner_manual.py at master · explosion/prodigy-recipes · GitHub", and found it is different from the inner recipe "ner.manual". For example, "ner_manual.py" has no pattern function but the inner recipe "ner.manual" has.

Could you please let me know where I can find the source codes for the inner recipe "ner.manual"?


The example recipes in the repo are slightly simplified and set up to work as standalone functions – we definitely want to give them another update, though, for the next release to reflect some of the newer features we added.

If you want to see the exact code Prodigy runs in the built-in recipes, you can always run prodigy stats to find the location of your local Prodigy installation, and then check out the Python files in prodigy/recipes.

Thank you very much for your suggestions.

Now I have raw data and patterns, and uses Spacy to generate "doc".

You mentioned " you can create data in Prodigy's format pretty easily using the processed doc. ... You can then add it to a dataset using Prodigy's database API: https://prodi.gy/docs/api-database#database"

Could you please show me some sample codes to generate prodigy dataset from Spacy "doc"?


Check out the code snippet I shared in my post above:

The dictionary you create here (example) is an entry in Prodigy's JSON format.

Thank you for your suggestions. They are very helpful.

Now, I am working on changing content showing during the labeling. I tried the following function:

stream = filter_inputs(stream, filter_list)

I was wondering if I can change the stream with a different list when I click the green button (accept) every time.



In general, you can always write your own filter function in your custom recipe – streams are regular Python generators, so you can do something like this and apply any filtering you need:

def filter_stream(stream):
    for eg in stream:
        # filter based on some properties in the example here
        yield eg

The update callback also gives you access to the batches of annotated examples that are submitted in the UI. You could then store any information about those already annotated examples in a variable in your recipe function that the filter_stream generator also has access to. This way, it can respond to collected annotations (which is also how annotating with a model in the loop works under the hood).

def update(answers):
    # do something with the answers here

One thing to keep in mind is that the stream and answers are sent in batches. So any update you make based on collected annotations will only affect the next batch that's being created afterwards (not the examples that are already queued up for annotation in the app).