We are using prodigy 1.8.3 and want to support multiple annotators labeling a dataset. The dataset is to be labeled only once(so if person A labeled it, it should not be available to person B). A few problems are -
If one person opens the window, then doesn’t label, that image won’t come to other people.
We want to keep count of how many images were labeled by each labeler. If possible, it would be good to know who labeled which image.
Saw some suggestions in other posts about hosting on multiple ports, one of each user, but that is not desirable. We want to host a single instance of prodigy to be used by all labelers.
Please let me know if any more information is required
Another question - How can we check the progress of overall process on the dashboard?
Are you using the named multi-user sessions with ?session=name appended to the URL? This should let you both possible scenarios: have all users label the same examples, or have all examples labelled only once by whichever user is available. You can find more details on this in the PRODIGY_README.html.
To answer your other question:
This is currently expected and there’s no easy answer: Prodigy can’t know that an example isn’t coming back, how long the annotators are taking or that they aren’t actually labelling. It sends out a batch and waits for it to come back in a batch of answers. If it never comes back, you’ll only be able to know this later when you restart the server.
Alternatively, you can implement your own logic and have something like an infinite loop that keeps sending out examples until all of them are annotated by someone. I’ve outlined the idea step-by-step in my post here:
However, if you have multiple people labelling at the same time, you’d still have to decide how you want to handle the delays and timeouts. When do you consider an example “gone” and send it out to someone else? Answers are sent back in batches, so depending on how complex the annotations are, it can easily take a while until a full batch of answers is sent back. (We also had to come up with a solution for this in Prodigy Scale and in the end, we did solve it with timeouts and some other checks – but it’s still tricky.)
If you annotate with named multi-user sessions, Prodigy will now add a "_session_id" key to each example that consists of the dataset name and the session name. For example, my_cool_set-ines. For each example you fetch from the database, you’ll be able to get the named session it was annotated in, so you could write your own script that computes the counts you need.
Here’s a simple example – untested, but something along those lines should work:
from collections import Counter
from prodigy.components.db import connect
counts = Counter()
db = connect()
examples = db.get_dataset("your_dataset")
for eg in examples:
session = eg["session_id"].split("-") if "_session_id" in eg else "n/a"
counts[session] += 1
The example will also give you access to the full annotations – so you could even calculate things like labels per session, accept/rejects per session and so on.
What do you mean by progress of overall process? How much of the given input data is already present in an (annotated) Prodigy dataset?