Non binary active learning

nsorros · October 7, 2022, 8:01am

Hey

Just wanted to ask if there is a reason behind the default binary interface for active learning recipes like ner.teach?

I assume this is to increase speed but I wonder whether in some cases its faster to be able to correct the annotation while keeping the sorting that comes from the model in the loop.

This is motivated from a test to annotate ORG entities in a news dataset. ORG entities in that dataset are not that infrequent. Manual annotation gives me predictable performance gains as data size increases whereas active learning using en_core_web_md struggles with the first hundred examples since I have to reject a lot of recommendations and keep only the ones that it already finds as correct. I still feel sorting the examples with a model in the loop provides benefits but i think what might work better in this example is to actively correct the labels instead of accept / reject.

I wonder whether there is a best practice I might be missing here which would suggest to use active learning when the model is performant above a reasonable threshold in order for most of the binary suggestions to make sense.

ryanwesslen · October 7, 2022, 12:28pm

hi @nsorros!

Thanks for your questions!

This is a common question and has been answered in older posts like this one:

To better understand the design philosophy, I would recommend watching some of Matt and Ines' early talks around 2018 (slides repo and YouTube talks playlist). One example is this talk/slides and a related YouTube talk on binary active learning.

Fair point! Have you tried to create your own custom recipe to experiment?

A major design philosophy of Prodigy is to provide smart defaults, but enable extensibility to developers because the best solutions are likely custom. Custom recipes are the way to implement customized tasks.

For example, you can find some details in our docs about customizing active learning recipes with NER.

Alternatively, you can find many open recipes in our prodigy-recipes repo, including multiple NER examples. Perhaps try to combine the ner.teach with the ner.manual and try it out!

Interesting idea - you should experiment and see what works! Keep us informed if you make any discoveries. We're always curious to hear about unique use cases.

nsorros · October 14, 2022, 1:35pm

Hi Ryan,

Thanks for your response. I understand that in most cases binary annotation speeds up the time to label and reach a certain level of performance which makes sense. Its just that in a toy example I was trying it seemed not to work as expected hence my question.

I did in fact experiment with correcting the annotations vs the binary process and the problem persists so its probably unrelated. Will open a new thread to continue the discussion.

Topic		Replies	Views
Active learning and correct directly, instead of binary classification first ner	2	281	August 30, 2023
Why is there no active learning for manual/gold standard annotations? ner , api , solved	2	723	May 10, 2019
active learning and update function ner , best-practices	1	1034	February 25, 2021
Active Learning: Does it work? discussion , best-practices	4	5830	May 15, 2018
Disable active-learning component ner_manual usage , ner , custom , solved	2	767	November 26, 2019

Non binary active learning

Related topics