Hi @laurejt,
Yes, that's correct. The reason the n
is not guaranteed is that the task router can only route to annotators we have a session for, so if we don't list the annotators upfront, the first batch will only be routed to the first annotator. If a second annotator joins immediately after, the router will send the tasks to two annotators from then on (with work allocated to new annotators as they join).
To avoid this problem with the first batch (or until at least two sessions have been registered), we can modify the recipe to create a couple of sessions up-front. This way even the first batch will be sent to two annotators. Annotators can also be added later.
To add sessions from the recipe level, we'd have to interact directly with the Controller
, object. That is 1) initialize the instance of the Controller
2) add the "initial sessions" and 3) return the Controller
instance from the recipe.
Here's the minimal example of how such recipe would look like:
import prodigy
from prodigy.components.stream import get_stream
from prodigy.core import Controller
from prodigy.protocols import ControllerComponentsDict
@prodigy.recipe(
"minimal.recipe",
dataset=("The dataset to use", "positional", None, str),
source=("The source data as a JSONL file", "positional", None, str),
)
def minimal_recipe(
dataset: str,
source: str,
):
stream = get_stream(source)
components: ControllerComponentsDict = {
"stream": stream,
"view_id": "classification",
"dataset": dataset,
} # this is the dictionary that you'd normally return from a recipe function
ctrl = Controller.from_components("minimal.recipe", components)
intial_sessions = ["bob", "alice", "sam"]
for session_name in intial_sessions:
session_id = ctrl.get_session_name(session_name) # get Prodigy format of the session name with dataset name prefix
ctrl.confirm_session(session_id) # initialize session in the Controller
return ctrl
This way the task router will have a minimum required pool of annotators from the get go and even if these "initial" annotators will drop off further in the project, the work stealing mechanism will reroute their tasks.
We actually had to fix a small regression in Prodigy to make sure it is permitted to return the Controller
object from the recipe, so please upgrade to Prodigy 1.15.8 before trying this out.
Hopefully, that solves the issue for you. Let me know if you need any assistance at all with the custom recipe for this.