Manual Annotation Dataset limit

VaishKandala · June 26, 2020, 10:31am

Hi,

I am doing manual annotations for a classifier. I want to set a threshold on the number of "Accepted" annotations.

For example: If we have 100 (Accept) samples the annotation can end. I am not using a classifier to train, it's a manual annotation.

ines · June 26, 2020, 1:11pm

Hi! In Prodigy v1.10, you could implement something like that using the validate_answer callback. It's not 100% what that function was originally designed to do, but it should work The annotator will then see an alert when 100 accepted answers are submitted and won't be able to submit any more.

To count the existing accepted answers, you could use the update callback, which gives you access to the batches of annotations that come back to the server. Here's a simple example:

total_accepted = 0

def update(answers):
    total_accepted += len([eg for eg in answers if eg["answer"] == "accept"])

def validate_answer(eg):
    if eg["answer"] == "accept" and total_accepted >= 100:
        raise ValueError("Enough accepted answers, you can stop :)")

One thing to keep in mind is that depending on the batch size, there may be a small delay until the annotator sees the alert, because the batches of answers first have to be sent back to the server until Prodigy can know that 100 annotations are there. You could minimise that by using a lower batch size or setting "instant_submit": True to immediately submit each answer as it's made in the app.

VaishKandala · June 29, 2020, 2:49pm

Thanks @Ines.

This works, another quick continuation question would be - can I update the Accept, Reject, Ignore from the previous annotations where this was not included and compute the total?

VaishKandala · June 29, 2020, 3:44pm

May be my question wasn't clear. I need to get the total counts over multiple sessions basically to draw the counts from the db answers not just the current annotations.

Thanks,
VAishnavi

ines · June 30, 2020, 11:21am

In that case, you could connect to the database and then call db.get_dataset to load a dataset and/or session to pre-populate your counts. You don't want to do that within the validate_answer callback, because otherwise, that would be called and re-computed every time a user submits an answer. If you do expect the dataset to change as the user annotates, you could update the counts periodically so it's faster and less epxensive to compute.

Topic		Replies	Views
How to get the count of unlabeled / unannotated data?	4	98	May 6, 2024
Annotations don't seem to have been saved automatically usage	1	361	July 15, 2020
Statistics on Classification Annotation Interface usage	1	300	January 26, 2022
Getting access to annotations before placed in db usage , database , custom , solved	8	2033	October 31, 2019
Saving annotations, measuring time taken per instance and remove skip option usage , custom , front-end	4	811	September 9, 2019

Manual Annotation Dataset limit

Related topics