Prodigy Annotation Task Allocation Issue with Multi-Session Setup

Hi everyone,

I wanted to reach out to the community to see if anyone can shed light on a recurring issue we observed with task allocation in a multi-annotator setup.

:memo: Incident Summary

  • Dataset: ~5,000 samples (~6,300 annotations estimated, 20% overlap)

  • Active annotators: anna, marina

  • Configured sessions: miguel,francesca, rosa, anna, marina

  • Relevant config: work stealing enabled (by default)

At several points, one annotator’s session (anna) would stop receiving tasks and display:

"No tasks available. Make sure to save your progress. If working in multi-annotator scenario, try reloading the page to see if more tasks have become accessible."

Despite unannotated samples still being available, Anna’s /get_session_questions endpoint returned an empty array of tasks.

This would resolve only when:

  • Another annotator (marina) refreshed their session, or

  • We opened a previously unused session (e.g. rosa), which immediately “unblocked” Anna’s session.


:date: Detailed Timeline

  • At ~2.1k annotated samples, Anna got “No tasks available.” Refreshing Marina’s session allowed her to continue. Days later, multiple intermittent occurrences; refreshing Marina’s session always fixed it.

  • Anna blocked again, but Marina had no session open. Opening rosa session for the first time “released” tasks and let Anna continue.


:magnifying_glass_tilted_left: Observations

  • /get_session_questions returned:
{
  "tasks": [],
  "total": 2410,
  "progress": null,
  "progress_kind": null,
  "session_id": "ca-es-translation-new-anna"
}

  • Total count (2410) confirmed samples remained.

  • Inactive sessions (miguel, francesca, rosa) appear to be holding reserved tasks, reducing available pool.

  • Opening a new session appears to release reserved tasks back to the pool.

  • Work stealing is enabled, but doesn’t seem to kick in unless another session becomes active (refresh/join event).


:light_bulb: Hypothesis

Prodigy may be over-reserving tasks for inactive sessions. With five sessions configured and only two active, Anna could be exhausting her share of reserved tasks while others remain locked until their sessions refresh. Work stealing may only reassign tasks once a session “wakes up.”


:white_check_mark: Actions Taken

  • Verified browser console – no errors.

  • Reproduced multiple times, confirmed it’s not client-side.

  • Opening an unused session immediately unblocked Anna’s queue.

  • Monitored responses from /get_session_questions to confirm behavior.


:red_question_mark: Question for the Community

  • Has anyone seen this behavior in multi-session setups?

  • Is there a way to force work stealing or task redistribution without needing another annotator to refresh or join?

  • Could this be a known limitation or configuration edge case (e.g. with PRODIGY_ALLOWED_SESSIONS and default work-stealing)?

Any insights or workarounds would be very helpful!

Thanks in advance,

Miguel

Hey @miguelclaramunt,

Thanks for the detailed report!
Which feed overlap setting are you using? Is it no overlap (feed_overlap: false) or partial overlap ( annotations_per_task setting)?
Thank you!

Hi Magda,

this is our current configuration in prodigy.json:

{
    ...
    "annotations_per_task": 1.2,
    "feed_overlap": false
}

Update: Marina is now experiencing the same issue. At ~1,150 annotated samples, her session stopped receiving tasks. For now, reloading (F5) the session temporarily resolves the problem.

Typo in " :date: Detailed Timeline" section: Anna previously encountered the same issue at ~1,230 samples, not ~2.1k; this 2.1k was the total annotated samples (by both Marina and Anna).

Hi @miguelclaramunt !

Thanks for the detailed report — the behavior you’re seeing is actually consistent with how Prodigy currently makes routing decisions in multi-annotator setups.

  1. When you set PRODIGY_ALLOWED_SESSIONS, the router (route_average_per_task) pre-assigns tasks to the session names in that list. These assignments are essentially reserved in the main stream. A task only becomes "open" (and thus stealable) when an annotator with that session ID connects and is served the task. The task is then moved to that session's _open_tasks list, as you can see in the get_questions method of the Session class ( prodigy/components/session.py).

  2. The steal_work function is designed to take tasks from other sessions that are idle. It does this exclusively by iterating through the _open_tasks of other active Session objects.

  3. If an annotator in your PRODIGY_ALLOWED_ANNOTATORS list never connects, the tasks assigned to them remain in the main stream but never enter anyone's _open_tasks list. Consequently, they cannot be stolen. This is precisely the behavior you're observing.

This was discussed in more detail in an older thread. A couple of relevant points from there:

  • With PRODIGY_ALLOWED_SESSIONS, the router can plan more precisely, but it also assumes that all declared sessions will eventually connect. If some stay inactive, their share of tasks can remain locked.
  • Most importantly, it’s not obvious when a session should be considered “deprecated,” so the system errs on the side of keeping its reservations.
  • Work stealing only redistributes when a session “wakes up,” not proactively in the background. And this is perhaps one thing we could make configurable in future versions.

As suggested in the thread I mentioned, you could work around the absent, but registered sessions by running a startup session initialization script that calls /get_session_questions once for each session right after launching the Prodigy server. By simulating a connection from each allowed annotator, you create Session objects for them and, by having them request one batch of tasks, you populate their _open_tasks, making those tasks available for stealing.
Here's example of such script:

import requests

PRODIGY_URL = "http://localhost:8080"

# Must match dataset-session_name pattern
SESSIONS = ["test-anna", "test-marina", "test-rosa", "test-miguel", "test-francesca"]

def initialize_sessions():
    """
    Call /get_session_questions once for each allowed session to
    populate their queues and trigger the router.
    """
    for session in SESSIONS:
        try:
            r = requests.post(
                f"{PRODIGY_URL}/get_session_questions",
                json={"session_id": session}
            )
            if r.status_code == 200:
                tasks = r.json().get("tasks", [])
                print(f"Session {session}: {len(tasks)} tasks initialized")
            else:
                print(f"Session {session} failed: {r.status_code} {r.text}")
        except Exception as e:
            print(f"Session {session} error: {e}")

if __name__ == "__main__":
    initialize_sessions()

Please not that in the POST request you should use dataset-session_name pattern e.g my_dataset-anna as this is the session ID that Prodigy creates during the Controller initialization.

1 Like