Tasking routing: ensuring n annotations without specifying sessions upfront

Based on my review of the documentation on task routing (including the video tutorial), it looks like the annotations_per_task will only guarantee at least n annotations per task when the sessions are defined in advanced via PRODIGY_ALLOWED_SESSIONS. Is this correct?

If so, how would you recommend achieving this behavior? In this particular scenario, each task is annotated at least 2 times (by two distinct annotators / sessions), but the full set of annotators / sessions is no known in advanced. More specifically, we expect there to have three "primary" annotators but an additional set of "secondary" annotators (not known in advanced) who will annotate in a more "burst"-y fashion.

Hi @laurejt,

Yes, that's correct. The reason the n is not guaranteed is that the task router can only route to annotators we have a session for, so if we don't list the annotators upfront, the first batch will only be routed to the first annotator. If a second annotator joins immediately after, the router will send the tasks to two annotators from then on (with work allocated to new annotators as they join).

To avoid this problem with the first batch (or until at least two sessions have been registered), we can modify the recipe to create a couple of sessions up-front. This way even the first batch will be sent to two annotators. Annotators can also be added later.
To add sessions from the recipe level, we'd have to interact directly with the Controller, object. That is 1) initialize the instance of the Controller 2) add the "initial sessions" and 3) return the Controller instance from the recipe.

Here's the minimal example of how such recipe would look like:

import prodigy
from prodigy.components.stream import get_stream
from prodigy.core import Controller
from prodigy.protocols import ControllerComponentsDict


@prodigy.recipe(
    "minimal.recipe",
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
)
def minimal_recipe(
    dataset: str,
    source: str,
):

    stream = get_stream(source)
    components: ControllerComponentsDict = {
        "stream": stream,
        "view_id": "classification",
        "dataset": dataset,
    } # this is the dictionary that you'd normally return from a recipe function

    ctrl = Controller.from_components("minimal.recipe", components)
    intial_sessions = ["bob", "alice", "sam"]
    for session_name in intial_sessions:
        session_id = ctrl.get_session_name(session_name) # get Prodigy format of the session name with dataset name prefix
        ctrl.confirm_session(session_id) # initialize session in the Controller
    return ctrl

This way the task router will have a minimum required pool of annotators from the get go and even if these "initial" annotators will drop off further in the project, the work stealing mechanism will reroute their tasks.
We actually had to fix a small regression in Prodigy to make sure it is permitted to return the Controller object from the recipe, so please upgrade to Prodigy 1.15.8 before trying this out.
Hopefully, that solves the issue for you. Let me know if you need any assistance at all with the custom recipe for this.