I've configured my own model in order to use Prodigy's active learning option. My dataset is a multiclass problem with 3 classes. My question is whether or not we can use Prodigy's active learning method to display the 3 options, rather than a binary option of accept/reject?
Along with this, is there a way to avoid showing the user/annotator what label the model has predicted? Curious, because I would look to avoid as much bias as possible from the annotator.
Thanks for your post and welcome to the Prodigy community
Here's a similar post that has been asked before:
That's a good question! Typically we would've assume the annotator wants it, but I can see certain types of task may want to remove them. Likely the best way would be to modify the css for .prodigy-label to make the label nothing. But let me talk to the team on Monday to see if there are other ideas!
Thanks for the quick response! We've looked at the recipes in detail and are still not making progress. I was wondering if you could give us some guidance in the right direction. We've been using the textcat_custom_model recipe on your GitHub as an outline, but when we still have trouble with configuring options to show up in the UI.
Here's the chunk of code where we are trying to get the options to appear in the UI:
class DummyModel(object):
def __init__(self, model_path, labels: List[str]):
self.custom_model = SetFitModel.from_pretrained(model_path)
self.labels = labels
def make_predictions(self, unlabeled_stream: Iterable[dict]):
# TODO: Use the current "best" model to make predictions
# and return both the label (e.g., Affirmed, Denied, etc)
# but also the "score" or confidence of that prediction
for example in unlabeled_stream:
sentence = [example["text"]] #for some reason the setfit model wants the text in an array
predictions = self.custom_model(sentence)
score = torch.max(self.custom_model.predict_proba(sentence)) #grab the max probability in the tensor
#trying to show all options in UI
options = [{"id": label, "text": label} for label in self.labels]
print('options', options)
example["options"] = options
yield (score.item(), example)
We also added the choice_style in the config in this part of the code shown below:
return {
"view_id": "classification", # Annotation interface to use
"dataset": "custom_model_dataset", # Name of dataset to save annotations
"stream": model_predicted_stream, # Incoming stream of examples
"update": update, # Update callback, called with batch of answers
"config": {
"choice_style": "single",
"labels": label, # the labels for the manual interface
}
}
You can choose to use the model to filter the stream of examples so that you may prioritize which examples to annotate first without passing the predictions into the example. That way, you may still choose to prefer specific cases (maybe you've got a rare case you'd like to get more examples for) without leaking any information to the annotator.
Your example looks about right, I think you're interested in using the choice view_id instead of the classification one.
The choice interface allows you to select from a set of 3 choices. Have you looked into that interface? If so, is there a reason why it may not work for you?