Hi,
I'm trying to run a custom review.py
recipe with a notes text field, and I'm getting this error:
Command:
PRODIGY_ALLOWED_SESSIONS=cheyanne prodigy review dataset_review_test_v1 ner_task_v1_c-katrina,ner_task_v1_c-linnea -F /Users/cheyannebaird/posh/annotation-service/src/annotator/recipes/review.py --label ACCOUNT,ACTIVITY,AMOUNT,BANK,CARDINAL,DATE,FAC,FREQUENCY,GPE,LANGUAGE,ORDINAL,ORG,PERCENT,PERSON,STT_ERROR,TIME,VEHICLE
Error:
✘ Invalid components returned by recipe 'review'
dataset field required
{'stream': <prodigy.components.stream.Stream object at 0x1302fea00>, 'view_id': 'blocks', 'config': {'blocks': [{'view_id': 'spans_manual'}, {'view_id': 'text_input', 'field_label': 'Notes'}], 'labels': ['ACCOUNT', 'ACTIVITY', 'AMOUNT', 'BANK', 'CARDINAL', 'DATE', 'FAC', 'FREQUENCY', 'GPE', 'LANGUAGE', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'STT_ERROR', 'TIME', 'VEHICLE'], 'exclude_by': 'input', 'ner_manual_highlight_chars': False, 'auto_count_stream': True}}
The only changes I made to the built-in review.py
include this at the end of the recipe:
result = {}
result.update(
stream=stream,
view_id="blocks",
config={
"blocks": [
{"view_id": "spans_manual"},
{"view_id": "text_input", "field_label": "Notes"}
],
"labels": label,
# "dataset": dataset,
"exclude_by": "input",
"ner_manual_highlight_chars": False,
"auto_count_stream": True
}
)
return result
And imports at the top:
import copy
from collections import defaultdict
from typing import Any, Dict, Iterator, List, Optional, Tuple
import json
import prodigy
from typing import Dict, Generator, Iterable, List, Optional, Union
import spacy
from prodigy.components.loaders import get_stream
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
from prodigy.core import recipe
from prodigy.recipes.spans import manual as prodigy_spans_manual
#from prodigy.types import RecipeSettingsType
#from prodigy.util import get_labels
from prodigy.components.db import Database, connect
from prodigy.components.decorators import support_both_streams
from prodigy.components.preprocess import fetch_media as fetch_media_preprocessor
from prodigy.components.stream import get_stream
from prodigy.core import recipe
from prodigy.types import RecipeSettingsType, StreamType, TaskType
from prodigy.util import (
IGNORE_HASH_KEYS,
INPUT_HASH_ATTR,
SESSION_ID_ATTR,
TASK_HASH_ATTR,
VIEW_ID_ATTR,
get_labels,
log,
msg,
set_hashes,
split_string,
)
I've provided the dataset in the command, so any idea why it's asking me for a dataset?
When I run this command just use the built-in as is, I get what I am looking for: two datasets annotated with spans to review (no errors):
PRODIGY_ALLOWED_SESSIONS=cheyanne prodigy review review_ner_validation_test_v1_c ner_task_v1_c-katrina,ner_task_v1_c-linnea --label ACCOUNT,ACTIVITY,AMOUNT,BANK,CARDINAL,DATE,FAC,FREQUENCY,GPE,LANGUAGE,ORDINAL,ORG,PERCENT,PERSON,STT_ERROR,TIME,VEHICLE --view-id spans_manual
Result:
Thanks,
Cheyanne