How does "set_hashes" work ?

Hi Jim. You have stumbled apon a keyname that's in the default ignore list and I can totally imagine the confusion. Let's go over it step-by-step.

If the key is just text, nothing unexpected is happening.

from prodigy import set_hashes 

paths = ["a", "b", "c"]

[set_hashes({"text": p}) for p in paths]
# [{'text': 'a', '_input_hash': -1808989213, '_task_hash': -1053809049},
#  {'text': 'b', '_input_hash': 748838038, '_task_hash': -150095526},
#  {'text': 'c', '_input_hash': -600324218, '_task_hash': 1805703639}]

Same with the key "paths". Note the extra , by the way, I want the input_keys to be a tuple!

[set_hashes({"paths": p}, input_keys=("paths", )) for p in paths]
# [{'paths': 'a', '_input_hash': -895075480, '_task_hash': 652594778},
#  {'paths': 'b', '_input_hash': -501196447, '_task_hash': 144266571},
#  {'paths': 'c', '_input_hash': -451190452, '_task_hash': 884629090}]

But once I call it path, it's all different.

[set_hashes({"path": p}) for p in paths]
# [{'path': 'a', '_input_hash': -1979175224, '_task_hash': -1952772507},
#  {'path': 'b', '_input_hash': -1979175224, '_task_hash': -1952772507},
#  {'path': 'c', '_input_hash': -1979175224, '_task_hash': -1952772507}]

That's because of the ignore parameter (docs). This has the following defaults (copied from source code):

IGNORE_HASH_KEYS = ("score", "rank", "model", "source", "pattern", "priority", "path", VIEW_ID_ATTR, SESSION_ID_ATTR, ANNOTATOR_ID_ATTR, "answer")

Notice that path is in there.

We can fix it via;

[set_hashes({"path": p}, ignore=[]) for p in paths]
# [{'path': 'a', '_input_hash': -1378317001, '_task_hash': -1022972960},
#  {'path': 'b', '_input_hash': -730857728, '_task_hash': -945938464},
#  {'path': 'c', '_input_hash': -82246316, '_task_hash': -1983325868}]
2 Likes