Custom recipes tutorial not working

Hi,

I was going through the custom recipe tutorial on Custom Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP, but it is not working. Specifically, I'm trying to run the cat facts example (code copy/pasted from the webpage below). It seems like the cat facts API has changed, but even accounting for that I'm still getting the error:

...
stream.apply(add_tokens, nlp=nlp, stream=stream)  # tokenize the stream for ner_manual
AttributeError: 'generator' object has no attribute 'apply'

Code (from webpage):

import prodigy
from prodigy.components.preprocess import add_tokens
import requests
import spacy

@prodigy.recipe("cat-facts")
def cat_facts_ner(dataset, lang="en"):
    # We can use the blocks to override certain config and content, and set
    # "text": None for the choice interface so it doesn't also render the text
    blocks = [
        {"view_id": "ner_manual"},
        {"view_id": "choice", "text": None},
        {"view_id": "text_input", "field_rows": 3, "field_label": "Explain your decision"}
    ]
    options = [
        {"id": 3, "text": "😺 Fully correct"},
        {"id": 2, "text": "😼 Partially correct"},
        {"id": 1, "text": "😾 Wrong"},
        {"id": 0, "text": "🙀 Don't know"}
    ]

    def get_stream():
        res = requests.get("https://cat-fact.herokuapp.com/facts").json()
        for fact in res["all"]:
            yield {"text": fact["text"], "options": options}

    nlp = spacy.blank(lang)           # blank spaCy pipeline for tokenization
    stream = get_stream()             # set up the stream
    stream.apply(add_tokens, nlp=nlp, stream=stream)  # tokenize the stream for ner_manual

    return {
        "dataset": dataset,          # the dataset to save annotations to
        "view_id": "blocks",         # set the view_id to "blocks"
        "stream": stream,            # the stream of incoming examples
        "config": {
            "labels": ["RELEVANT"],  # the labels for the manual NER interface
            "blocks": blocks         # add the blocks to the config
        }
    }

How can I fix these issues? Also, is there a more in-depth guide for custom recipes?

Thanks

Hi @ale,

Sorry about the outdated example! I've just updated the website so you should be able to recreate it without errors now.
The main things I changed was:

  1. processing the API response (it's now a list)
  2. updated how add_tokens is applied to stream. It looks like we've updated this example to use the newer API (concretely, the [apply](Components and Functions · Prodigy · An annotation tool for AI, Machine Learning & NLP) method of Stream component) but the source in this example is the old style generator function so we can't use the newer API here.

As for the in-depth guide for custom recipes, we have this section in the docs. I assume you've seen it already so let us know if there's any particular aspect you'd like some more info on.
If you'd like to see some end-to-end examples of custom recipes, I recommend checking this repository in particular tutorials and other folders.

Thanks @magdaaniol! It is working now.

I have a few questions about this custom recipe and my use case. First, a little background. My team is doing NER and RE at the same time with rel.manual. We want to categorize tricky cases we find in Prodigy and also allow for annotators to leave a comment regarding what they found difficult for a sentence.

  1. I see that the choice answers are stored in an "accept" attribute in the JSON format. Is it possible to customize the name of this attribute?
  2. Comments in the text input field are saved to an attribute called "user_input" in the JSON format. Can this one also be customized?
  3. We have identified categories of tricky cases. In our case, we would use the "choice" interface to select the type of tricky case when we encounter one. If we add a new category in the future by updating the recipe, would it be an issue if we continue to save to the same database even though the previous examples lack the new categories in the "options" field?
  4. Is it possible to review the annotations of this custom recipe with the review recipe or we need a custom review recipe too?
  5. Related to the one above, how could we review only the NER and RE annotations for the accepted examples (and exclude the choice and text input answers)?

Thanks!

Hi @magdaaniol,

I continued experimenting with custom recipes and have other questions. Here is the custom recipe I'm building:

import prodigy
from prodigy.core import Arg, recipe
from prodigy.components.stream import get_stream
from prodigy.components.preprocess import add_tokens
import spacy

@prodigy.recipe(
    "test-recipe",
    dataset = Arg(help="Dataset to save answers to."),
    file_path=Arg(help="Path to texts")
)
def test_recipe(dataset: str, file_path):
    stream = get_stream(file_path) # load in the JSON file

    blocks = [
        {"view_id": "ner_manual"},
        {"view_id": "choice", "text": None, "options": [{"id": "option_1", "text": "Option 1"}]},
        {"view_id": "text_input", "field_rows": 3, "field_label": "Comments", "field_id": "comments"}
    ]

    return {
        "dataset": dataset,
        "view_id": "blocks",
        "stream": stream,
        "config": {
            "labels": ["LABEL"],
            "blocks": blocks,
            "choice_style": "multiple"
        }
    }

My questions are:

  1. When I run the recipe I get the following warning:
⚠ Prodigy automatically assigned an input/task hash because it was
missing. This automatic hashing will be deprecated as of Prodigy v2 because it
can lead to unwanted duplicates in custom recipes if the examples deviate from
the default assumptions. More information can found on the docs:
https://prodi.gy/docs/api-components#set_hashes

What should I do to have a hashing consistent with the default behaviour in Prodigy? I see the documentation suggests:

from prodigy import set_hashes

stream = (set_hashes(eg) for eg in stream)
stream = (set_hashes(eg, input_keys=("text", "custom_text")) for eg in stream)
  1. I have added the options for the choice component in the blocks view_id. It seems to be working fine there. Can I add the options to there instead of adding it to each example with a add_options function as shown in the documentation webpage?

  2. The text is not being displayed for the ner_manual task when I run this recipe. Do you know what is going wrong?

If there are any Prodigy best practices that I should incorporate in this recipe it would be very useful to know.

Thanks

Hi @ale,

Answering inline:

  1. I see that the choice answers are stored in an "accept" attribute in the JSON format. Is it possible to customize the name of this attribute?

It's not possible to customize it via recipe settings or arguments. You could modify it programatically by adding a before_db callback to your recipe which would essentially overwrite the task dictionary with the new key:

def before_db(examples):
    for eg in examples:
        accepted_options = eg.get("accept")
        if accepted_options:
            eg["my_custom_key"] = accepted_options
            del eg["accept"] # you could delete the original annotation but it's recommended to keep it as is
    return examples

This callback should be returned from the recipe under the before_db key:

 return {
        "view_id": "choice",
        "dataset": dataset,
        "stream": stream,
        "exclude": exclude,
        "before_db": before_db, # custom callback
        "config": {
            ...
        },
    }
  1. Comments in the text input field are saved to an attribute called "user_input" in the JSON format. Can this one also be customized?

Yes. You can customize the name of the attribute from the recipe level by specifying field_id in view_id definition. Please check here for an example of how field_id should be used.

  1. If we add a new category in the future by updating the recipe, would it be an issue if we continue to save to the same database even though the previous examples lack the new categories in the "options" field?

No, Prodigy follows "append only" policy with respect to storing annotation examples. So if you restart the server with a new label set, the examples that have more options will be just appended to the existing ones. You would need to consider how to use such hybrid dataset for training, though. If the old examples could potentially be labelled with the new categories (but they aren't bc the category didn't exist when the annotation was made) this can be really confusing to the model. This is why it is rarely a good idea to modify the label set during the annotation. If possible, it is recommended to do a pilot annotation on the representative sample of data to calibrate the label set. Once you're confident you have all categories you need, you would proceed to the main annotation step.

Another option if you do find out that you've missed on the category, would be to review the existing annotations with the new category as option included or, even better, in a binary yes/no workflow (which will require some post processing to compile the final annotation from the first multiple choice pass and the binary pass). Yet another option would be to correct model mistakes (e.g. with textcat.correct).
In any case, you need make sure the all final categories are well represented in your dev set so that you can see if the introduction of the category is causing troubles.

4.Is it possible to review the annotations of this custom recipe with the review recipe or we need a custom review recipe too?

Yes, you will need a custom review recipe. It's impossible to make assumptions about the components of custom recipes which why review supports only built-in UI. Also, you are able to review one view id at a time because otherwise the interface could become really illegible.

  1. Related to the one above, how could we review only the NER and RE annotations for the accepted examples (and exclude the choice and text input answers)?

In review you need to specify the view_id that the recipe is supposed to render. Please note that it will be impossible if you modified the names of the keys under which the NER and RE annotations are stored.
So in this case, you should be able to review both NER and REL by specifying relations as view_id on CLI and adding relations_span_labels with a list of all NER labels to prodigy.json as described here. If the only diff is wrt to a span it should also be rendered as differing versions in review

1 Like

Hi @ale:

If you don't need custom hashing function (because e.g. you have some custom fields that should be used to distinguish the examples) it is fine to just let Prodigy do it. The task and input hashes will be consistent. The warning there is just to inform that from Prodigy v2 the user will have to take care of it to make sure they are in full control.
What Prodigy does currently automatically is to call set_hashes under hood with the default task keys. You can consult what the default keys are in the set_hashes documentation.
Also, if your recipe is modifying the task with respect to the keys used in the hashing function e.g it adds annotations from patterns or a model (which is not the case here) it is recommended to call set_hashes after the modification to reflect these changes.

I have added the options for the choice component in the blocks view_id. It seems to be working fine there. Can I add the options to there instead of adding it to each example with a add_options function as shown in the documentation webpage?

Yes, you could but then you'd have no record of what the annotator had to choose from. It's always recommended to store all the information required to recreate the annotation task and that includes the available options. Also Prodigy train recipe uses this options field and and wouldn't be able to generate spaCy examples from the annotations if it is missing.

  1. The text is not being displayed for the ner_manual task when I run this recipe. Do you know what is going wrong?

As explained in this example the tokens are required for ner_manual view id. To add them you can use add_tokens helper (which you are already importing). You will also need a spaCy tokenizer. Here I'm using the basic spaCy tokenizer for English. Adding the following lines should make the recipe show all the blocks:

nlp = spacy.blank("en")
stream.apply(add_tokens, stream=stream, nlp=nlp)

Nothing crucial occurs to me on top of what I've said already. Storing options on the example is probably one of the more important "good practice" pointers.

1 Like

16 posts were split to a new topic: Prodigy hashing behavior