Active learning and correct directly, instead of binary classification first

didmar · August 29, 2023, 7:54pm

Hello,

I was wondering why there is no NER recipe that selects documents using active learning then lets the user correct the tags right away (like ner.correct does)?
What is the benefit of separating the process in two with ner.teach and ner.silver-to-gold?

(I have beta tested another labeling tool that would select a document through active learning, apply the model to predict all the tags, then let the user correct them. This seemed way more efficient to me, hence my confusion.)

ryanwesslen · August 29, 2023, 9:06pm

Bonjour @didmar!

Thanks for your question! I think you're asking why the active learning recipe frames tasks as binary, not like a manual recipe like ner.correct. This has come up before:

As Matt mentioned, this is just a design choice to avoid cramming too many things within the built-in recipes.

There is nothing stopping you from developing your own custom recipe to do this. It's important to think of the built-in recipes as the floor of what's possible, not the ceiling of what's possible. The built-in recipes are there to get you started with smart defaults, but may need to be modified or extended.

One Prodigy pro tip: You can view the built-in recipes by finding your installed Prodigy package location (run prodigy stats and view Location:), and then find the recipes folder. For example, the ner recipes can be found inrecipes/ner.py. If you want, you can combine and modify the recipes to your preferences by using different sorters. We outline in the NER docs pseudo code to write a custom recipe with NER active learning.

Somewhat related, Matt mentioned in this post earlier some of the evolution in Prodigy's custom recipes design and we're we've had to rethink over time:

Thank you for this feedback! I'm going to write up an internal ticket exploring more. If you have other feedback, please fill out our user survey. It's given us a treasure-trove of fixes and enhancements on top of our upcoming releases.

didmar · August 30, 2023, 8:57pm

Hi @ryanwesslen !

Thank you for the detailed answer!
OK, I think I understand better now why you would want to split it in two, first focusing on find the most uncertain predictions without carefully going though the whole document.
In my case, ner.teach usually comes back multiple times to the same document for different spans, but I guess this is because my model is not that good yet?

Topic		Replies	Views
Non binary active learning ner , best-practices	2	371	October 14, 2022
Why is there no active learning for manual/gold standard annotations? ner , api , solved	2	722	May 10, 2019
active learning and update function ner , best-practices	1	1032	February 25, 2021
I'd like to extend the existing NER model usage , ner , solved	3	596	September 25, 2020
Active Learning: Does it work? discussion , best-practices	4	5823	May 15, 2018

Active learning and correct directly, instead of binary classification first

Related topics