Why is there no active learning for manual/gold standard annotations?

Hi there,

Apologies in case this was brought up before or in the documentation, I haven’t found a good answer.

From what I understand (and I might be wrong, I haven’t studied the details behind the active learning strategy yet), the way the binary active learning works is by doing one weight update for each batch of binary annotations. The model then tries to predict the next bunch of entities and asks the user for the ones it’s most uncertain about.

While you have made a compelling point on why the binary workflow is great for quick annotation, it requires to start with a model that has at least some idea of what the entities are.

I am wondering why the gold-standard-creation recipe does not update the model along the way? It does already use the model to suggest entities (making it easier for the user to quickly correct them instead of marking them all manually). It seems like updating the model after each batch (or even after every single annotated paragraph) would make it better and better along the way, reducing the amount of manual corrections the user needs to bring.

You can definitely provide an update() callback function in a recipe that uses the manual interface. Custom recipes are very useful in general, as the Python API is much more powerful than the recipe arguments.

Ultimately we decided to avoid trying to cram every combination of possible options into the built-in CLI. If you make the CLI too complicated, you end up programming via the CLI, and learning too many arbitrary details. At some point it’s better to switch over to Python. The source for the built-in recipes is provided with prodigy, and you can find starter custom recipes here: https://github.com/explosion/prodigy-recipes

You’re right, I’ve been relying on the CLI so far as it’s easier to get into, but it’s probably time I get into using the API directly.

Thanks for your quick answer as usual.