Are you using the named multi-user sessions with ?session=name
appended to the URL? This should let you both possible scenarios: have all users label the same examples, or have all examples labelled only once by whichever user is available. You can find more details on this in the PRODIGY_README.html
.
To answer your other question:
This is currently expected and there's no easy answer: Prodigy can't know that an example isn't coming back, how long the annotators are taking or that they aren't actually labelling. It sends out a batch and waits for it to come back in a batch of answers. If it never comes back, you'll only be able to know this later when you restart the server.
Alternatively, you can implement your own logic and have something like an infinite loop that keeps sending out examples until all of them are annotated by someone. I've outlined the idea step-by-step in my post here:
However, if you have multiple people labelling at the same time, you'd still have to decide how you want to handle the delays and timeouts. When do you consider an example "gone" and send it out to someone else? Answers are sent back in batches, so depending on how complex the annotations are, it can easily take a while until a full batch of answers is sent back. (We also had to come up with a solution for this in Prodigy Scale and in the end, we did solve it with timeouts and some other checks – but it's still tricky.)
If you annotate with named multi-user sessions, Prodigy will now add a "_session_id"
key to each example that consists of the dataset name and the session name. For example, my_cool_set-ines
. For each example you fetch from the database, you'll be able to get the named session it was annotated in, so you could write your own script that computes the counts you need.
Here's a simple example – untested, but something along those lines should work:
from collections import Counter
from prodigy.components.db import connect
counts = Counter()
db = connect()
examples = db.get_dataset("your_dataset")
for eg in examples:
session = eg["session_id"].split("-")[1] if "_session_id" in eg else "n/a"
counts[session] += 1
print(counts)
The example will also give you access to the full annotations – so you could even calculate things like labels per session, accept/rejects per session and so on.
What do you mean by progress of overall process? How much of the given input data is already present in an (annotated) Prodigy dataset?