Hello Prodigy Team! I am writing a custom review recipe, and trying to understand the thought process behind defining default hash keys.
When hashing tasks just by calling set_hashes
, we use the following:
from prodigy.util import IGNORE_HASH_KEYS, INPUT_HASH_KEYS, TASK_HASH_KEYS
print("INPUT_HASH_KEYS:", INPUT_HASH_KEYS)
print("TASK_HASH_KEYS:", TASK_HASH_KEYS)
print("IGNORE_HASH_KEYS:", IGNORE_HASH_KEYS)
INPUT_HASH_KEYS: ('text', 'image', 'html', 'input')
TASK_HASH_KEYS: ('spans', 'label', 'options', 'arcs')
IGNORE_HASH_KEYS: ('score', 'rank', 'model', 'source', 'pattern', 'priority', 'path', '_view_id', '_session_id', '_annotator_id', 'answer')
It seems that the initial idea was to hash the input in _input_hash
and what needs to be done in _task_hash
. Is that right?
Any reason relations
is not in the TASK_HASH_KEYS
by the way?
Now, when I look at the review recipe, the hash keys become:
INPUT_KEYS = ("text", "image", "html", "options", "audio", "video")
TASK_KEYS = ("spans", "label", "accept", "audio_spans", "relations")
options
migrated to the input, and answer
and relations
added to the task hash.
So, now it seems that the _task_hash
's meaning changes from "what needed to be done" to "what was actually done"?
I have no issues with changing the task hash based on user actions, however when we change the input hash keys, we lose the ability to link to the original documents that were reviewed... What are your thoughts on that? I guess, we can re-hash the reviewed result using the original keys in before_db
?
Another question / observation. When we include ignored and rejected answers in the review, I imagine we want to see different answers separately, no? answer
is not in the TASK_KEYS
(and also is in IGNORE
).