Multiple annotators without personal repetition

lev.konst · October 2, 2017, 7:21pm

Thanks for making Prodigy - it’s a great tool. I am aware that multiple annotators support is on the roadmap and not supported but trying to make something work in the meantime.

Would be great if the same task wasn’t asked from the same annotator, but was shown to others.

Trying to adapt the code from mark recipe below and wondering if there is a way to hash the task after adding the annotator field to the task task['annotator'] = annotator.
Here is the mark scenario code that I am hoping to modify somehow:

        for eg in stream:
            if TASK_HASH_ATTR in eg and eg[TASK_HASH_ATTR] in memory:
                answer = memory[eg[TASK_HASH_ATTR]]
                counts[answer] += 1
            else:
                yield eg

ines · October 2, 2017, 9:07pm

Thanks – always nice to see what others are building with Prodigy!

To answer your question: Yes, there’s a prodigy.util.set_hashes() helper function that does exactly that. It looks like this:

Argument	Type	Description
`task`	dict	The annotation task to hash.
`input_keys`	list / tuple	The task attributes to consider when generating the input hash. Default: `('text', 'image', 'html', 'input')`.
`task_keys`	list / tuple	The task attributes to consider when generating the task hash. Default: `('spans', 'label')`.
`overwrite`	bool	Overwrite already existing hashes.
RETURNS	dict	The annotation task with added hashes.

For example:

task_keys = ('spans', 'label', 'annotator')
hashed_tasks = [set_hashes(eg, task_keys=task_keys, overwrite=True) for eg in tasks]

The hashing works like this:

If one or more of the keys are present in the task, their values are concatenated and hashed using mmh3.
If no keys are found, the full task is dumped as JSON and hashed instead.
For the task hash, the input hash is added as a prefix to the concatenated values (or JSON dump) before hashing. This ensures that the task hash is always generated with respect to the original input.

honnibal · October 5, 2017, 12:01am

Another solution would be to do the annotation management upstream of Prodigy: you would have another service which owned a central data feed, which would split out work and send it to each annotator. Inside Prodigy, you would just be pulling tasks from a local service and using that as the stream.

This is probably how we'll end up doing things, because we think the annotation management system should really be a separate tool. If it were inside Prodigy it would be an entirely separate subcommand that ran as a service. It seems much clearer to break it out into its own executable.

Topic		Replies	Views
Restarting Prodigy with a new session usage , solved	9	1999	October 28, 2022
Continue to annotate same data in new session enhancement , done	19	4003	October 5, 2018
How to keep count of annotations done by a person ? usage	8	2990	October 16, 2018
Passing the same sample more than once (with different meta-data) to the annotation server	6	50	June 25, 2025
What’s the difference between an example and a task in Prodigy?	5	318	June 28, 2022

Multiple annotators without personal repetition

Related topics