hi @FarisHijazi!
Thanks for your message and welcome to the Prodigy community
Interesting project!
Curious - is your goal to run experiments to determine the best strategies on using active learning for improved accuracy? Interesting stuff! I haven't seen Robert's code but I would be interested to learn more. I can get back to you later.
One interesting thing for "simulating" active learning is to also add noise to the annotations (i.e., purposely make x% of annotations incorrect). If you do a simulation where you assume the annotator is correct each time, this isn't reflective on the reality that annotators make mistakes. I've devised AL experiments in the past and found an important "hyperparameter" is the assumed accuracy of the annotators. Just another factor to consider in your experiments.
Have you heard/used Prodigy's entry points?
Entry points let you expose parts of a Python package you write to other Python packages. This lets one application easily customize the behavior of another, by exposing an entry point in its
setup.py
orsetup.cfg
. For a quick and fun intro to entry points in Python, check out this excellent blog post. Prodigy can load custom function from several different entry points, for example custom recipe functions. To see this in action, check out thesense2vec
package, which provides several custom Prodigy recipes. The recipes are registered automatically if you install the package in the same environment as Prodigy. The following entry point groups are supported:
prodigy_recipes |
Entry points for recipe functions. |
prodigy_db |
Entry points for custom Database classes. |
prodigy_loaders |
Entry points for custom loader functions. |
I haven't used these yet but they may do the trick. Here's where they were used in sense2vec
.
Let me think more in general -- I may have some suggestions.
In the meantime, I found a relevant post (which you may have read already):