Hello,
I am trying to create a custom recipe where I can do text classification with multiple choice while updating the default TextClassifier model at the same time. I am using as base the textcat_teach recipe
I have the code for the multiple choice + classification with predictions. I am also able to update the answer before is written to the DB so I can correct the label before it is written to the DB it in case the prediction was not correct.
What I am not sure how to do or if I am doing it right is to update the model on the go with the annotation value (Accept/Reject) and how to check. Do I need to call "model.update" in my review_answers function?
Here is my config for the multi choice and the update with my custom method
return {
"view_id": "blocks",
"dataset": dataset, # Name of dataset to save annotations
"stream": stream, # Incoming stream of examples
"config": {
"blocks": [
{"view_id": "choice"},
],
"lang": nlp.lang # Additional config settings, mostly for app UI
},
"update": review_answers, # Update callback, called with batch of answers
"exclude": exclude # List of dataset names to exclude
}
Then, my review_answers is this:
def review_answers(answers):
# Overwrite the tag before writing to disk
for eg in answers:
# Update the label tag with the answer from the annotator
if eg["answer"] == "accept" and len(eg["accept"]) > 0:
eg["label"] = eg["accept"][0]
model.update # Is this the right way to update the model?
The flow of my recipe is basically the same as the textcat_teach recipe
# Load the stream from a JSONL file and return a generator that yields a
# dictionary for each example in the data.
stream = JSONL(source)
# Load the spaCy model
nlp = spacy.load(spacy_model)
# Initialize Prodigy's text classifier model, which outputs
# (score, example) tuples
model = TextClassifier(nlp, label)
if patterns is None:
# No patterns are used, so just use the model to suggest examples
# and only use the model's update method as the update callback
predict = model
update = model.update # Do I need to do model.update here?? Update is not used
else:
# Initialize the pattern matcher and load in the JSONL patterns.
# Set the matcher to not label the highlighted spans, only the text.
matcher = PatternMatcher(
nlp,
prior_correct=5.0,
prior_incorrect=5.0,
label_span=False,
label_task=True,
)
matcher = matcher.from_disk(patterns)
# Combine the NER model and the matcher and interleave their
# suggestions and update both at the same time
predict, update = combine_models(model, matcher)
# Use the prefer_uncertain sorter to focus on suggestions that the model
# is most uncertain about (i.e. with a score closest to 0.5). The model
# yields (score, example) tuples and the sorter yields just the example
stream = prefer_uncertain(predict(stream))
stream = add_options(stream, label)
Thanks for any guidance you can provide.