Custom recipes tutorial not working

Hi @ale,

Answering inline:

  1. I see that the choice answers are stored in an "accept" attribute in the JSON format. Is it possible to customize the name of this attribute?

It's not possible to customize it via recipe settings or arguments. You could modify it programatically by adding a before_db callback to your recipe which would essentially overwrite the task dictionary with the new key:

def before_db(examples):
    for eg in examples:
        accepted_options = eg.get("accept")
        if accepted_options:
            eg["my_custom_key"] = accepted_options
            del eg["accept"] # you could delete the original annotation but it's recommended to keep it as is
    return examples

This callback should be returned from the recipe under the before_db key:

 return {
        "view_id": "choice",
        "dataset": dataset,
        "stream": stream,
        "exclude": exclude,
        "before_db": before_db, # custom callback
        "config": {
            ...
        },
    }
  1. Comments in the text input field are saved to an attribute called "user_input" in the JSON format. Can this one also be customized?

Yes. You can customize the name of the attribute from the recipe level by specifying field_id in view_id definition. Please check here for an example of how field_id should be used.

  1. If we add a new category in the future by updating the recipe, would it be an issue if we continue to save to the same database even though the previous examples lack the new categories in the "options" field?

No, Prodigy follows "append only" policy with respect to storing annotation examples. So if you restart the server with a new label set, the examples that have more options will be just appended to the existing ones. You would need to consider how to use such hybrid dataset for training, though. If the old examples could potentially be labelled with the new categories (but they aren't bc the category didn't exist when the annotation was made) this can be really confusing to the model. This is why it is rarely a good idea to modify the label set during the annotation. If possible, it is recommended to do a pilot annotation on the representative sample of data to calibrate the label set. Once you're confident you have all categories you need, you would proceed to the main annotation step.

Another option if you do find out that you've missed on the category, would be to review the existing annotations with the new category as option included or, even better, in a binary yes/no workflow (which will require some post processing to compile the final annotation from the first multiple choice pass and the binary pass). Yet another option would be to correct model mistakes (e.g. with textcat.correct).
In any case, you need make sure the all final categories are well represented in your dev set so that you can see if the introduction of the category is causing troubles.

4.Is it possible to review the annotations of this custom recipe with the review recipe or we need a custom review recipe too?

Yes, you will need a custom review recipe. It's impossible to make assumptions about the components of custom recipes which why review supports only built-in UI. Also, you are able to review one view id at a time because otherwise the interface could become really illegible.

  1. Related to the one above, how could we review only the NER and RE annotations for the accepted examples (and exclude the choice and text input answers)?

In review you need to specify the view_id that the recipe is supposed to render. Please note that it will be impossible if you modified the names of the keys under which the NER and RE annotations are stored.
So in this case, you should be able to review both NER and REL by specifying relations as view_id on CLI and adding relations_span_labels with a list of all NER labels to prodigy.json as described here. If the only diff is wrt to a span it should also be rendered as differing versions in review

1 Like