If I had a project with 10K records to annotate/label, and I have a team of X number of people with individual session names, is it possible to set a limit on a per record/document level that says, collect Y number of labels.
I see the option used to set the stream to single session or multi-session, but have not been able to determine if setting such a limit is possible.
Example: team of 10 people. Each document should be labeled no more than 4 times.
This isn't natively supported by Prodigy at this point. But I am wondering if perhaps you could do something clever by preparing the examples.jsonl upfront.
You could make a examples_alice.jsonl to give to employee #1 (which I'll call Alice) and another examples_bob.jsonl to give to employee #2 (which I'll call Bob). This does assume that both Alice and Bob are able to run Prodigy locally but can send their datasets to a shared database.
the running locally is the limiting factor. Our prodigy instance is deployed on an EC2 instance and then exposed to our teams and vendors. We use named sessions to track work and aggregate labels/annotations by agent. We cannot have a local deployment for each user, they are not working in a technical environment.
We are considering doing something similar to our data files, and having multiple instances up and running to spread annotators across different instance URL. Then each dataset would be "swapped" to the other instance to collect adjudication labels.
A more detailed use case here for thinking about how to support it:
We have a dataset of 10K columnA and columnB pairs. We want a label on a 4 point scale for similarity. We want the document to appear as many times as it takes for 3 matching labels on 1 of the 4 point scales. Once a document receives 3 or more labels on 1 of the options, it is removed from the stream. This allows us to utilize an agent force of any size, without having to send every record to each agent.
This was something that I might also have suggested in your situation.
I suppose another "idea" was to maybe run a daily batch job that can prepare the datasets for each annotator and can reset the associated Prodigy service. The benefit of this is that you can fully customise the examples shown to each user and in what order they'll see it. The main downside is that there is some additional complexity in maintaining such a cronjob and that this is very much an offline update that needs to happen in order to keep annotating.