Review recipe error: dataset field required

Hi,

I'm trying to run a custom review.py recipe with a notes text field, and I'm getting this error:

Command:

PRODIGY_ALLOWED_SESSIONS=cheyanne prodigy review dataset_review_test_v1 ner_task_v1_c-katrina,ner_task_v1_c-linnea -F /Users/cheyannebaird/posh/annotation-service/src/annotator/recipes/review.py --label ACCOUNT,ACTIVITY,AMOUNT,BANK,CARDINAL,DATE,FAC,FREQUENCY,GPE,LANGUAGE,ORDINAL,ORG,PERCENT,PERSON,STT_ERROR,TIME,VEHICLE

Error:

✘ Invalid components returned by recipe 'review'
dataset   field required

{'stream': <prodigy.components.stream.Stream object at 0x1302fea00>, 'view_id': 'blocks', 'config': {'blocks': [{'view_id': 'spans_manual'}, {'view_id': 'text_input', 'field_label': 'Notes'}], 'labels': ['ACCOUNT', 'ACTIVITY', 'AMOUNT', 'BANK', 'CARDINAL', 'DATE', 'FAC', 'FREQUENCY', 'GPE', 'LANGUAGE', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'STT_ERROR', 'TIME', 'VEHICLE'], 'exclude_by': 'input', 'ner_manual_highlight_chars': False, 'auto_count_stream': True}}

The only changes I made to the built-in review.py include this at the end of the recipe:

   result = {}
    result.update(
        stream=stream,
        view_id="blocks",
        config={
            "blocks": [
                {"view_id": "spans_manual"},
                {"view_id": "text_input", "field_label": "Notes"}
            ],
            "labels": label,
#            "dataset": dataset,
            "exclude_by": "input",
            "ner_manual_highlight_chars": False,
            "auto_count_stream": True
        }
    )

    return result

And imports at the top:

import copy
from collections import defaultdict
from typing import Any, Dict, Iterator, List, Optional, Tuple

import json
import prodigy
from typing import Dict, Generator, Iterable, List, Optional, Union

import spacy
from prodigy.components.loaders import get_stream
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
from prodigy.core import recipe
from prodigy.recipes.spans import manual as prodigy_spans_manual
#from prodigy.types import RecipeSettingsType
#from prodigy.util import get_labels

from prodigy.components.db import Database, connect
from prodigy.components.decorators import support_both_streams
from prodigy.components.preprocess import fetch_media as fetch_media_preprocessor
from prodigy.components.stream import get_stream
from prodigy.core import recipe
from prodigy.types import RecipeSettingsType, StreamType, TaskType
from prodigy.util import (
    IGNORE_HASH_KEYS,
    INPUT_HASH_ATTR,
    SESSION_ID_ATTR,
    TASK_HASH_ATTR,
    VIEW_ID_ATTR,
    get_labels,
    log,
    msg,
    set_hashes,
    split_string,
)

I've provided the dataset in the command, so any idea why it's asking me for a dataset?

When I run this command just use the built-in as is, I get what I am looking for: two datasets annotated with spans to review (no errors):

PRODIGY_ALLOWED_SESSIONS=cheyanne prodigy review review_ner_validation_test_v1_c ner_task_v1_c-katrina,ner_task_v1_c-linnea --label ACCOUNT,ACTIVITY,AMOUNT,BANK,CARDINAL,DATE,FAC,FREQUENCY,GPE,LANGUAGE,ORDINAL,ORG,PERCENT,PERSON,STT_ERROR,TIME,VEHICLE --view-id spans_manual

Result:

Thanks,
Cheyanne

When I look at the code it seems like the dataset is commented out, but it also seems like it's under the config key. I think it should be at the top level. Maybe something like this:

    result = {}
    result.update(
        stream=stream,
        view_id="blocks",
        dataset=dataset,
        config={
            "blocks": [
                {"view_id": "spans_manual"},
                {"view_id": "text_input", "field_label": "Notes"}
            ],
            "labels": label,
            "exclude_by": "input",
            "ner_manual_highlight_chars": False,
            "auto_count_stream": True
        }
    )

    return result

Does this help?

With the revised code above, I get the following results:

Command without --view-id spans_manual flag because I'm trying to handle it in the recipe:

PRODIGY_ALLOWED_SESSIONS=cheyanne prodigy review dataset_review_test_v1 ner_task_v1_c-katrina,ner_task_v1_c-linnea -F /Users/cheyannebaird/posh/annotation-service/src/annotator/recipes/review_spans.py --label ACCOUNT,ACTIVITY,AMOUNT,BANK,CARDINAL,DATE,FAC,FREQUENCY,GPE,LANGUAGE,ORDINAL,ORG,PERCENT,PERSON,STT_ERROR,TIME,VEHICLE

This shows "no blocks available", and my text input field is not present.

With this command with the --view-id spans_manual flag:

PRODIGY_ALLOWED_SESSIONS=cheyanne prodigy review dataset_review_test_v1 ner_task_v1_c-katrina,ner_task_v1_c-linnea -F /Users/cheyannebaird/posh/annotation-service/src/annotator/recipes/review_spans.py --label ACCOUNT,ACTIVITY,AMOUNT,BANK,CARDINAL,DATE,FAC,FREQUENCY,GPE,LANGUAGE,ORDINAL,ORG,PERCENT,PERSON,STT_ERROR,TIME,VEHICLE --view-id spans_manual

Here, I get what I am looking for, except the text input field is still not present, so I think this final block of code is not recognizing the "blocks" view-id.

I figured I might have a proper go at this. So I start by annotating this dataset:

{"text": "Vincent, a great name."}
{"text": "Kevin, also a great name."}

I use this recipe call:

PRODIGY_ALLOWED_SESSIONS="vincent,foobar" PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": true}' python -m prodigy spans.manual spans-annot blank:en examples.jsonl --label NAME

And that gives me this interface.

I have annotations that disagree, so I'll confirm by using the normal review recipe.

python -m prodigy review reviewed-spans spans-annot --view-id spans_manual

Sofar, so good. But let's now try to customise this a bit with a text field. To do that, I only change the output dictionary.

Before

    return {
        "view_id": "review",
        "dataset": dataset,
        "stream": stream,
        "before_db": before_db,
        "config": config,
    }

After

    return {
        "view_id": "blocks",
        "dataset": dataset,
        "stream": stream,
        "before_db": before_db,
        "config": {
            **config,
            "blocks": [
                {"view_id": "review"},
                {"view_id": "text_input", "field_label": "Notes"}
            ],
        }
    }

When I now run this recipe ...

python -m prodigy review.custom reviewed-spans spans-annot --view-id spans_manual -F myreview.py --label NAME

... I see this:

I think the original issue was that the view_id should mention "blocks", but the definition of the blocks themselves should go in the config key.

@cheyanneb let me know if this didn't work for you.

1 Like