Can active learning help reduce annotation inconsistencies?

Hi,

This is an interesting idea!

I think it should be relatively doable to implement. Basically what you want to do, is take a portion of the later annotations - a portion that is representative of the task and large enough to train a model on. Once you have that model trained, you can run it on the texts of the earlier annotations, and basically compare the predictions between the model and your original (older) annotations programmatically. Only when they diverge, you can set aside the original (older) annotation in a separate dataset. Now you'll (hopefully) have a much smaller dataset to review, containing potentially "conflicting" annotations that you might annotate differently now. You can then use Prodigy's review recipe to go over them manually :slight_smile:

1 Like