Thanks for making Prodigy - it’s a great tool. I am aware that multiple annotators support is on the roadmap and not supported but trying to make something work in the meantime.
Would be great if the same task wasn’t asked from the same annotator, but was shown to others.
Trying to adapt the code from
mark recipe below and wondering if there is a way to hash the task after adding the annotator field to the task
task['annotator'] = annotator.
Here is the
mark scenario code that I am hoping to modify somehow:
for eg in stream:
if TASK_HASH_ATTR in eg and eg[TASK_HASH_ATTR] in memory:
answer = memory[eg[TASK_HASH_ATTR]]
counts[answer] += 1
Thanks – always nice to see what others are building with Prodigy!
To answer your question: Yes, there’s a
prodigy.util.set_hashes() helper function that does exactly that. It looks like this:
||The annotation task to hash.
|list / tuple
||The task attributes to consider when generating the input hash. Default:
('text', 'image', 'html', 'input').
|list / tuple
||The task attributes to consider when generating the task hash. Default:
||Overwrite already existing hashes.
||The annotation task with added hashes.
task_keys = ('spans', 'label', 'annotator')
hashed_tasks = [set_hashes(eg, task_keys=task_keys, overwrite=True) for eg in tasks]
The hashing works like this:
- If one or more of the keys are present in the task, their values are concatenated and hashed using
- If no keys are found, the full task is dumped as JSON and hashed instead.
- For the task hash, the input hash is added as a prefix to the concatenated values (or JSON dump) before hashing. This ensures that the task hash is always generated with respect to the original input.
Another solution would be to do the annotation management upstream of Prodigy: you would have another service which owned a central data feed, which would split out work and send it to each annotator. Inside Prodigy, you would just be pulling tasks from a local service and using that as the stream.
This is probably how we’ll end up doing things, because we think the annotation management system should really be a separate tool. If it were inside Prodigy it would be an entirely separate subcommand that ran as a service. It seems much clearer to break it out into its own executable.