Prodigy Annotation Task Allocation Issue with Multi-Session Setup

magdaaniol · September 24, 2025, 11:29am

Thanks for the detailed report — the behavior you’re seeing is actually consistent with how Prodigy currently makes routing decisions in multi-annotator setups.

When you set PRODIGY_ALLOWED_SESSIONS, the router (route_average_per_task) pre-assigns tasks to the session names in that list. These assignments are essentially reserved in the main stream. A task only becomes "open" (and thus stealable) when an annotator with that session ID connects and is served the task. The task is then moved to that session's _open_tasks list, as you can see in the get_questions method of the Session class ( prodigy/components/session.py).
The steal_work function is designed to take tasks from other sessions that are idle. It does this exclusively by iterating through the _open_tasks of other active Session objects.
If an annotator in your PRODIGY_ALLOWED_ANNOTATORS list never connects, the tasks assigned to them remain in the main stream but never enter anyone's _open_tasks list. Consequently, they cannot be stolen. This is precisely the behavior you're observing.

This was discussed in more detail in an older thread. A couple of relevant points from there:

With PRODIGY_ALLOWED_SESSIONS, the router can plan more precisely, but it also assumes that all declared sessions will eventually connect. If some stay inactive, their share of tasks can remain locked.
Most importantly, it’s not obvious when a session should be considered “deprecated,” so the system errs on the side of keeping its reservations.
Work stealing only redistributes when a session “wakes up,” not proactively in the background. And this is perhaps one thing we could make configurable in future versions.

As suggested in the thread I mentioned, you could work around the absent, but registered sessions by running a startup session initialization script that calls /get_session_questions once for each session right after launching the Prodigy server. By simulating a connection from each allowed annotator, you create Session objects for them and, by having them request one batch of tasks, you populate their _open_tasks, making those tasks available for stealing.
Here's example of such script:

import requests

PRODIGY_URL = "http://localhost:8080"

# Must match dataset-session_name pattern
SESSIONS = ["test-anna", "test-marina", "test-rosa", "test-miguel", "test-francesca"]

def initialize_sessions():
    """
    Call /get_session_questions once for each allowed session to
    populate their queues and trigger the router.
    """
    for session in SESSIONS:
        try:
            r = requests.post(
                f"{PRODIGY_URL}/get_session_questions",
                json={"session_id": session}
            )
            if r.status_code == 200:
                tasks = r.json().get("tasks", [])
                print(f"Session {session}: {len(tasks)} tasks initialized")
            else:
                print(f"Session {session} failed: {r.status_code} {r.text}")
        except Exception as e:
            print(f"Session {session} error: {e}")

if __name__ == "__main__":
    initialize_sessions()

Please not that in the POST request you should use dataset-session_name pattern e.g my_dataset-anna as this is the session ID that Prodigy creates during the Controller initialization.

Topic		Replies	Views
Avoid tasks to be stolen between sessions	3	378	July 7, 2023
Allowing for a constant stream of examples in a multi-annotator setting usage , streams , multi-user	3	319	April 17, 2024
Work Stealing not working	1	44	February 25, 2025
Repeated Examples in Multi-user Sessions streams	7	989	January 24, 2023
Tasks left unannotated with multi-annotator setup	7	63	June 19, 2025

Prodigy Annotation Task Allocation Issue with Multi-Session Setup

Related topics