Prodigy hashing behavior

magdaaniol · June 25, 2024, 8:24am

Some answers inline:

In the documentation, where are task_keys extracted from? The default is ("spans", "label", "options"). Are these from the recipe dictionary or attributes of each example or somewhere else?

These are extracted from the attributes of each example, yes. The built-in recipes create certain task structures (dictionaries) specific to each recipe. Thus, if you want to add a custom task_key for the hashing function to use, it should be a first level key on the task dictionary.

Using the custom recipe cat-facts example from above, I ran a small test with two annotators: jane and joe. First, I annotated sentences with "labels": ["RELEVANT"] with jane. Then I changed the recipe's labels to "labels": ["CAT"] and annotated with joe. For both annotators, the same sentences have equal input hashes (expected) but also same task hashes. Shouldn't the task hashes be different because I'm using different labels?

The default keys used for computing the task_hash are: spans, label, options, arcs. If you look closely there's no label attribute on the custom task here. The label attribute is stored for binary classification tasks. In this case the config attribute labels is used for determining the UI and the labels will be stored under spans if there are any. Thus, for NER, the task hash is affected by pre-existing spans, not by the set of labels available. The idea is to distinguish between the "kinds" of annotation or what is being annotated, not particular label sets.

On a separate topic, is it possible to have more than 1 interface of the same interface type? For example, a custom recipe with two choice interfaces, each with different options.

Technically, you could define multiple choice blocks. You would need to add the respective options as value of the "options" key in the block definition:

blocks = [
        {"view_id": "ner_manual"},
        {"view_id": "choice", "text": None, "options": options},
        {"view_id": "choice", "text": None, "options": options2},
        {"view_id": "text_input", "field_rows": 3, "field_label": "Explain your decision"}
    ]

Please note that all answers will be written under the same accept key, so in order to be able to mark the options from both blocks, you would need to switch to "multiple" choice style. With the single style there will be only one answer permitted per both blocks. Also, by default, the keyboard shortcuts will be the same for both blocks so you might want to modify them or completely disable via custom javascript.

If you want more flexibility/control over the final UI you can always use custom HTML and JavaScript and build your own form with multiple checkboxes / radio button groups. window.prodigy.update callback lets you update the current task with any custom data, like information about the checkbox that was selected. Here's a straightforward example of a custom checkbox:

Is it possible to add a static question (or title) above the choice interface in the cat-facts recipe?

Yes, you can achieve that by adding another html block on top of existing choice blocks:

 blocks = [
        {"view_id": "ner_manual"},
        {"view_id": "html"},
        {"view_id": "choice", "text": None, "html":None, "options": options},
        {"view_id": "text_input", "field_rows": 3, "field_label": "Explain your decision"}
    ]

Note, that similarly to text, html has to be set to None in the choiceview_id definition to prevent the text from appearing twice.
The html view_id expects html field on the task so that will have to be added while you're creating the tasks:

 def get_stream():
        res = requests.get("https://cat-fact.herokuapp.com/facts").json()
        for fact in res:
            yield {"text": fact["text"], "options": options, "html":"<h2>This is my static question</h2>"}

You can also add extra styling, of course. Please check the custom interfaces section on html and css for examples.

Is it possible to add theme options to the recipe so that it is not necessary to specify them in prodigy.json? For example relationHeight and relationHeightWrap
from the documentation.

Yes, Prodigy merges the configuration from the global and the local prodigy.json, cli overrides and the config key returned from the recipe. So you can return custom_theme dictionary under the config key of the dictionary returned from the recipe:

  return {
        "dataset": dataset,          # the dataset to save annotations to
        "view_id": "blocks",         # set the view_id to "blocks"
        "stream": stream,            # the stream of incoming examples
        "config": {
            "labels": ["CAT"],  # the labels for the manual NER interface
            "blocks": blocks,  # add the blocks to the config
            "custom_theme": {"buttonSize": 500}  # set custom theme options    
        }
    }

Topic		Replies	Views
Logic behind hash keys (in relation to REVIEW API)	4	12	October 16, 2024
Textcat correct recipe usage , textcat , solved	1	629	September 16, 2020
Multiple annotators without personal repetition usage	2	1288	October 5, 2017
Bug with review recipe in 1.10.2+ done , review	8	672	September 8, 2020
set_hashes unpredicted behaviour usage , solved	3	554	November 9, 2020

Prodigy hashing behavior

Related topics