Can active learning help reduce annotation inconsistencies?

SofieVL · July 22, 2021, 6:30am

Hi,

This is an interesting idea!

I think it should be relatively doable to implement. Basically what you want to do, is take a portion of the later annotations - a portion that is representative of the task and large enough to train a model on. Once you have that model trained, you can run it on the texts of the earlier annotations, and basically compare the predictions between the model and your original (older) annotations programmatically. Only when they diverge, you can set aside the original (older) annotation in a separate dataset. Now you'll (hopefully) have a much smaller dataset to review, containing potentially "conflicting" annotations that you might annotate differently now. You can then use Prodigy's review recipe to go over them manually

Topic		Replies	Views
Active Learning: Does it work? discussion , best-practices	4	5855	May 15, 2018
how to score everything in active learning?	4	261	October 24, 2022
from textcat.manual to textcat.teach usage , textcat , best-practices	1	584	February 13, 2022
Non binary active learning ner , best-practices	2	378	October 14, 2022
Taking advantage of TONS of unlabeled data usage , solved	4	859	April 3, 2019

Can active learning help reduce annotation inconsistencies?

Related topics