Intra-annotator agreement using custom task router

Hi @miguelclaramunt,

If it's possible, I want to know if this behaviour will interfere with the conditions stated in my previous question and if so, how this can be avoided. I was planning on implementing condition #2: Stop after 1 accept AND 1 reject for this category; this is a cumulative condition across all annotations for this category.

It sounds like the condition "Stop after 1 accept AND 1 reject for this category" and the repeat probability discussed here might be conflicting. Imagine you set question A to be sent to Alex 2 times, but then you also want to stop routing this task once you got 1 accept and 1 reject for this task. This means that if the conditions is met before Alex has got the chance to see the task again, they won't be able to see it.

You need to decide how you want to reconcile the conflict and program this logic it in the router. For that, apart from the global category stats, you also need to keep the session internal stats and make routing decisions by taking both into account.

Also, passing a custom router to the controller replaces the built-in router. So, "annotations_per_task": 1.5 set in the global prodigy.json will no longer apply. To keep it, you'd need to take the route_average_per_task router as a starting point for the development of your custom router. As I mentioned before, the source code for this router is available in components/router.py

Finally, I think for complex routing procedures like that one, that require task buffering and resending based on both session internal and global conditions, it's probably easier to distribute tasks per annotator in the desired way a priori . Especially buffering of tasks and checking how many examples have been annotated in the meantime would be difficult in Prodigy's streaming design.

When creating the input file, you would insert the desired number of repeated questions spacing them as desired. You could add a "target_session" field in the meta of each task and have a simple router that routes based on that information (similar to this one). This router could also apply the restrictions about not sending the task again if some conditions are met. For example, if it's not a "intra-annotator" repetition task, the stopping condition could apply. If instead, it is a "intra-annotator" repetition task, the stopping repetition condition should be lifted. You might want to add an indication if it's a repetition task in meta to make the function simple.

In other words, you would implement the repetition and spacing for the intra-annotator agreement by "curating" the input file outside Prodigy, and then you'd implement the category based conditions via router as dicussed here with added reconciliation logic in case the rules conflict.

If you decide to go this route, you'll also need a custom hashing function that takes into account information that distinguishes the repeated tasks so that the deduplication mechanism won't skip it. You can see more details on showing the same questions multiple times to the same annotator in this post

1 Like