Hi Jim. You have stumbled apon a keyname that's in the default ignore list and I can totally imagine the confusion. Let's go over it step-by-step.
If the key is just text
, nothing unexpected is happening.
from prodigy import set_hashes
paths = ["a", "b", "c"]
[set_hashes({"text": p}) for p in paths]
# [{'text': 'a', '_input_hash': -1808989213, '_task_hash': -1053809049},
# {'text': 'b', '_input_hash': 748838038, '_task_hash': -150095526},
# {'text': 'c', '_input_hash': -600324218, '_task_hash': 1805703639}]
Same with the key "paths". Note the extra ,
by the way, I want the input_keys
to be a tuple!
[set_hashes({"paths": p}, input_keys=("paths", )) for p in paths]
# [{'paths': 'a', '_input_hash': -895075480, '_task_hash': 652594778},
# {'paths': 'b', '_input_hash': -501196447, '_task_hash': 144266571},
# {'paths': 'c', '_input_hash': -451190452, '_task_hash': 884629090}]
But once I call it path
, it's all different.
[set_hashes({"path": p}) for p in paths]
# [{'path': 'a', '_input_hash': -1979175224, '_task_hash': -1952772507},
# {'path': 'b', '_input_hash': -1979175224, '_task_hash': -1952772507},
# {'path': 'c', '_input_hash': -1979175224, '_task_hash': -1952772507}]
That's because of the ignore
parameter (docs). This has the following defaults (copied from source code):
IGNORE_HASH_KEYS = ("score", "rank", "model", "source", "pattern", "priority", "path", VIEW_ID_ATTR, SESSION_ID_ATTR, ANNOTATOR_ID_ATTR, "answer")
Notice that path
is in there.
We can fix it via;
[set_hashes({"path": p}, ignore=[]) for p in paths]
# [{'path': 'a', '_input_hash': -1378317001, '_task_hash': -1022972960},
# {'path': 'b', '_input_hash': -730857728, '_task_hash': -945938464},
# {'path': 'c', '_input_hash': -82246316, '_task_hash': -1983325868}]