Allowing for a constant stream of examples in a multi-annotator setting

magdaaniol · April 17, 2024, 10:07am

The observed difference between these two overlap settings is due to the way work_stealing is applied. In feed_overlap the work_stealing is applied until there are no unsaved examples left in any of the sessions. That is why you could have gotten to the end of the queue annotating with joe only. In the case of annotations_per_task the work stealing is applied only until the estimated target for joe has been reached which would be ca 75 examples (taking the 150 input). The remaining questions will be queued for jane until she gets to them.

The idea behind annotations_per_task setting is to approximate this target as closely as possible given the available pool of the annotators. If more than annotation per task is configured, it is expected there will be enough active annotators to reach that target. If you can't assume there will be enough active annotators to reach the target, it's better to use feed_overlap setting (especially if you have 2 annotators and want one annotator to annotated whatever the other annotator didn't).

About your second experiment with 1.2 annotations per task.
The 1.2 condition is applied on each pull to the main queue. Everytime joe asks for questions, the task router first tries to satisfy the whole number part of the fraction i.e. 1. The to handle the fraction part i.e. 0.2, it computes the probability where the task should be sent. Effectively, some tasks are sent to joe, some to alice and some to both.
If only joe is annotating, the probability of sending the task to jane would increase given the 1.2 condition and the fact that she is expected to annotate as well. This is why joe is allowed to steal less and less as he progresses (and this is why you're observing numbers smaller than the batch size).
It's a probability based mechanism because it's hard to know upfront how many you should send in total to one or another. Since the input files can be huge or have undefined size the Controller cannot know upfront the total, which is why it needs to be estimated on batch-basis. In other words, the task router takes "local" decisions trying to fulfill the conditions as best as possible given current conditions.
This post explains the mechanism a bit more: How does `annotations_per_task : 2.5` work.

Is there a way to ensure that the 30 examples that will have 2 annotations get such annotations from different annotators (in this case, jane and joe)?

The multiple annotations resulting from task router settins (overlap or annotations_per_example) always come from different annotators.
As mentioned above, especially the fraction value of annotations_per_task is the number of annotations per task that you can expect on average. Task router will try to fulfill this target as best as possible based on the progress of annotations. If all annotators defined in ALLOWED_SESSIONS are active, with enough number of annotations (cf. the proablistic based assignemnt) the final numbers will converge to the sttetting.
If you'd rather have a more precise router because you can afford precomputing the total or even the queues, you can always implement your custom task router, see here for some examples of how it can be done: Task Routing · Prodigy · An annotation tool for AI, Machine Learning & NLP

Topic		Replies	Views
Issue in multi-session mode: duplicated annotation tasks and different order? enhancement , done , streams	19	2783	May 28, 2020
Multi-user sessions and excluding annotations by session enhancement , usage , streams	7	1679	December 25, 2019
Multi-session - annotators do not receive all tasks with feed_overlap with textcat.manual recipe textcat , streams	3	874	January 4, 2021
How to keep count of annotations done by a person ? usage	8	3002	October 16, 2018
Same examples being shown to different annotators usage , to-be-released , streams	8	569	November 19, 2021

Allowing for a constant stream of examples in a multi-annotator setting

Related topics