Active learning and correct directly, instead of binary classification first

Hello,

I was wondering why there is no NER recipe that selects documents using active learning then lets the user correct the tags right away (like ner.correct does)?
What is the benefit of separating the process in two with ner.teach and ner.silver-to-gold?

(I have beta tested another labeling tool that would select a document through active learning, apply the model to predict all the tags, then let the user correct them. This seemed way more efficient to me, hence my confusion.)

Bonjour @didmar!

Thanks for your question! I think you're asking why the active learning recipe frames tasks as binary, not like a manual recipe like ner.correct. This has come up before:

As Matt mentioned, this is just a design choice to avoid cramming too many things within the built-in recipes.

There is nothing stopping you from developing your own custom recipe to do this. It's important to think of the built-in recipes as the floor of what's possible, not the ceiling of what's possible. The built-in recipes are there to get you started with smart defaults, but may need to be modified or extended.

One Prodigy pro tip: You can view the built-in recipes by finding your installed Prodigy package location (run prodigy stats and view Location:), and then find the recipes folder. For example, the ner recipes can be found inrecipes/ner.py. If you want, you can combine and modify the recipes to your preferences by using different sorters. We outline in the NER docs pseudo code to write a custom recipe with NER active learning.

Somewhat related, Matt mentioned in this post earlier some of the evolution in Prodigy's custom recipes design and we're we've had to rethink over time:

Thank you for this feedback! I'm going to write up an internal ticket exploring more. If you have other feedback, please fill out our user survey. It's given us a treasure-trove of fixes and enhancements on top of our upcoming releases.

1 Like

Hi @ryanwesslen !

Thank you for the detailed answer!
OK, I think I understand better now why you would want to split it in two, first focusing on find the most uncertain predictions without carefully going though the whole document.
In my case, ner.teach usually comes back multiple times to the same document for different spans, but I guess this is because my model is not that good yet?