Hi @miguelclaramunt !
Thanks for the detailed report — the behavior you’re seeing is actually consistent with how Prodigy currently makes routing decisions in multi-annotator setups.
-
When you set
PRODIGY_ALLOWED_SESSIONS, the router (route_average_per_task) pre-assigns tasks to the session names in that list. These assignments are essentially reserved in the main stream. A task only becomes "open" (and thus stealable) when an annotator with that session ID connects and is served the task. The task is then moved to that session's_open_taskslist, as you can see in theget_questionsmethod of theSessionclass ( prodigy/components/session.py). -
The
steal_workfunction is designed to take tasks from other sessions that are idle. It does this exclusively by iterating through the_open_tasksof other active Session objects. -
If an annotator in your
PRODIGY_ALLOWED_ANNOTATORSlist never connects, the tasks assigned to them remain in the main stream but never enter anyone's_open_taskslist. Consequently, they cannot be stolen. This is precisely the behavior you're observing.
This was discussed in more detail in an older thread. A couple of relevant points from there:
- With
PRODIGY_ALLOWED_SESSIONS, the router can plan more precisely, but it also assumes that all declared sessions will eventually connect. If some stay inactive, their share of tasks can remain locked. - Most importantly, it’s not obvious when a session should be considered “deprecated,” so the system errs on the side of keeping its reservations.
- Work stealing only redistributes when a session “wakes up,” not proactively in the background. And this is perhaps one thing we could make configurable in future versions.
As suggested in the thread I mentioned, you could work around the absent, but registered sessions by running a startup session initialization script that calls /get_session_questions once for each session right after launching the Prodigy server. By simulating a connection from each allowed annotator, you create Session objects for them and, by having them request one batch of tasks, you populate their _open_tasks, making those tasks available for stealing.
Here's example of such script:
import requests
PRODIGY_URL = "http://localhost:8080"
# Must match dataset-session_name pattern
SESSIONS = ["test-anna", "test-marina", "test-rosa", "test-miguel", "test-francesca"]
def initialize_sessions():
"""
Call /get_session_questions once for each allowed session to
populate their queues and trigger the router.
"""
for session in SESSIONS:
try:
r = requests.post(
f"{PRODIGY_URL}/get_session_questions",
json={"session_id": session}
)
if r.status_code == 200:
tasks = r.json().get("tasks", [])
print(f"Session {session}: {len(tasks)} tasks initialized")
else:
print(f"Session {session} failed: {r.status_code} {r.text}")
except Exception as e:
print(f"Session {session} error: {e}")
if __name__ == "__main__":
initialize_sessions()
Please not that in the POST request you should use dataset-session_name pattern e.g my_dataset-anna as this is the session ID that Prodigy creates during the Controller initialization.