Can active learning help reduce annotation inconsistencies?

Silent · July 21, 2021, 12:10am

As the dataset grow it’s hard to maintain consistency across the entire set.
Especially when the first annotations was months ago, and now I have more in-depth knowledge on the data.

Can active learning help with this? For example by comparing to a threshold, pick up annotations that "surprise" the model the most, and ask the annotator "does this annotation still looks OK to you?"

If this could be done it could help cleanup inconsistent and wrong annotations. I'm currently doing that manually.

SofieVL · July 22, 2021, 6:30am

Hi,

This is an interesting idea!

I think it should be relatively doable to implement. Basically what you want to do, is take a portion of the later annotations - a portion that is representative of the task and large enough to train a model on. Once you have that model trained, you can run it on the texts of the earlier annotations, and basically compare the predictions between the model and your original (older) annotations programmatically. Only when they diverge, you can set aside the original (older) annotation in a separate dataset. Now you'll (hopefully) have a much smaller dataset to review, containing potentially "conflicting" annotations that you might annotate differently now. You can then use Prodigy's review recipe to go over them manually

Topic		Replies	Views
active learning and update function ner , best-practices	1	1030	February 25, 2021
Prodigy Version 1.5.1 vs 1.4.2 api , solved	3	851	July 21, 2018
Active Learning: Does it work? discussion , best-practices	4	5822	May 15, 2018
Using active learning on a multiclass problem usage , textcat	3	335	October 17, 2023
Multi user capability and active learning usage , ner	2	848	January 25, 2023

Can active learning help reduce annotation inconsistencies?

Related topics