To make any model integrate with Prodigy’s active learning workflow, you mainly need to expose two functions:
- a
predict
function that takes an iterable stream of examples in Prodigy’s JSON format, scores them and yields(score, example)
tuples - an
update
callback that takes a list of annotated examples and updates the model accordingly
Here’s a pseudocode example of how this could look in a custom text classification recipe. How you implement the individual components of course depends on the specifics of your model.
import copy
from prodigy.components.loaders import JSONL
from prodigy.components.sorters import prefer_uncertain
@prodigy.recipe('custom')
def custom_recipe(dataset, source):
stream = JSONL(source)
model = load_your_model()
def predict(stream):
for eg in stream:
predictions = get_predictions_from_model(eg)
for label, score in predictions:
example = copy.deepcopy(eg)
example['label'] = label
yield (score, example)
def update(answers):
for eg in answers:
if eg['answer'] == 'accept':
update_model_with_accept(eg)
elif eg['answer'] == 'reject':
update_model_with_reject(eg)
loss = get_loss()
return loss
return {
'dataset': dataset,
'view_id': 'classification',
'stream': prefer_uncertain(predict(stream)),
'update': update
}
You can also find more details on the expected formats and component APIs in your PRODIGY_README.html
or in the custom recipes workflow.