The way the web app works is actually pretty straightforward: Everytime a user opens the site, it makes a request to the /get_questions
endpoint, which will return the next batch from the stream. This means that if two clients connect to the same session, they will get different batches of data, whatever is next up on the queue.
If you want to have multiple people annotate the same data, I'd recommend starting multiple instances – for example, run Prodigy on different ports (e.g. by setting the PRODIGY_PORT
environment variable when you execute the command). Each annotator could then also have their own dedicated dataset that their answers are saved to. This means you'll be able to compare the work performed by the individual people.
There's no simple answer for this and how you want to use conflicting annotations later on is something you have to decide. If annotators all add to their own datasets, you'll be able to export the data and compare it to find and resolve conflicts.
One strategy could be to take the datasets, find answers with the same _task_hash
(same question) but with different answers. You could then use a threshold of, say, 80% agreement to decide whether to include the example or not. So if 80% of annotators agree, you include the example – otherwise, you don't, or reannotate it yourself to make the final decision.
Maybe you'll also find that it's usually the same annotator who disagrees with everyone else – this could indicate a misunderstanding about the annotation scheme. This is obviously super important and something you want to find out as soon as possible. So I'd recommend exporting and analysing the data with this type of objective very early on in the process.
I'd also recommend checking out the following addon, which was developed by a fellow Prodigy user. It includes a range of features to use the tool with multiple annotators and get stats and analytics:
We're also working on an extension product, the Prodigy Annotation Manager, which is very close to a public beta now The app will have a service component and let you manage multiple users, analyse their results and performance, create complex annotation workflows interactively and build larger corpora and labelled datasets. If that's sounds relevant, definitely keep an eye on the forum for the official announcement.