setting limit per document for consensus

nickboeka · January 12, 2023, 10:13pm

If I had a project with 10K records to annotate/label, and I have a team of X number of people with individual session names, is it possible to set a limit on a per record/document level that says, collect Y number of labels.

I see the option used to set the stream to single session or multi-session, but have not been able to determine if setting such a limit is possible.

Example: team of 10 people. Each document should be labeled no more than 4 times.

koaning · January 16, 2023, 11:32am

This isn't natively supported by Prodigy at this point. But I am wondering if perhaps you could do something clever by preparing the examples.jsonl upfront.

You could make a examples_alice.jsonl to give to employee #1 (which I'll call Alice) and another examples_bob.jsonl to give to employee #2 (which I'll call Bob). This does assume that both Alice and Bob are able to run Prodigy locally but can send their datasets to a shared database.

nickboeka · January 17, 2023, 5:46pm

the running locally is the limiting factor. Our prodigy instance is deployed on an EC2 instance and then exposed to our teams and vendors. We use named sessions to track work and aggregate labels/annotations by agent. We cannot have a local deployment for each user, they are not working in a technical environment.

We are considering doing something similar to our data files, and having multiple instances up and running to spread annotators across different instance URL. Then each dataset would be "swapped" to the other instance to collect adjudication labels.

A more detailed use case here for thinking about how to support it:

We have a dataset of 10K columnA and columnB pairs. We want a label on a 4 point scale for similarity. We want the document to appear as many times as it takes for 3 matching labels on 1 of the 4 point scales. Once a document receives 3 or more labels on 1 of the options, it is removed from the stream. This allows us to utilize an agent force of any size, without having to send every record to each agent.

koaning · January 20, 2023, 10:02am

This was something that I might also have suggested in your situation.

I suppose another "idea" was to maybe run a daily batch job that can prepare the datasets for each annotator and can reset the associated Prodigy service. The benefit of this is that you can fully customise the examples shown to each user and in what order they'll see it. The main downside is that there is some additional complexity in maintaining such a cronjob and that this is very much an offline update that needs to happen in order to keep annotating.

Topic		Replies	Views
How to keep count of annotations done by a person ? usage	8	2997	October 16, 2018
Multiple users on same server/session usage	1	2213	July 1, 2019
Custom ner.manual with different label sets ner	7	1425	August 16, 2022
Allowing for a constant stream of examples in a multi-annotator setting usage , streams , multi-user	3	288	April 17, 2024
Named multi-user session exceeds dataset length usage , streams	1	585	January 2, 2022

setting limit per document for consensus

Related topics