Using active learning on a multiclass problem

joannvuong · September 29, 2023, 9:37pm

I've configured my own model in order to use Prodigy's active learning option. My dataset is a multiclass problem with 3 classes. My question is whether or not we can use Prodigy's active learning method to display the 3 options, rather than a binary option of accept/reject?

Along with this, is there a way to avoid showing the user/annotator what label the model has predicted? Curious, because I would look to avoid as much bias as possible from the annotator.

Thanks,
JoAnn

ryanwesslen · September 29, 2023, 10:31pm

hi @joannvuong!

Thanks for your post and welcome to the Prodigy community

Here's a similar post that has been asked before:

That's a good question! Typically we would've assume the annotator wants it, but I can see certain types of task may want to remove them. Likely the best way would be to modify the css for .prodigy-label to make the label nothing. But let me talk to the team on Monday to see if there are other ideas!

Hope this helps!

joannvuong · October 10, 2023, 5:10pm

Hi @ryanwesslen

Thanks for the quick response! We've looked at the recipes in detail and are still not making progress. I was wondering if you could give us some guidance in the right direction. We've been using the textcat_custom_model recipe on your GitHub as an outline, but when we still have trouble with configuring options to show up in the UI.

Here's the chunk of code where we are trying to get the options to appear in the UI:

class DummyModel(object):
  def __init__(self, model_path, labels: List[str]):

      self.custom_model = SetFitModel.from_pretrained(model_path)
      self.labels = labels


  def make_predictions(self, unlabeled_stream: Iterable[dict]):

	# TODO: Use the current "best" model to make predictions
	# and return both the label (e.g., Affirmed, Denied, etc)
	# but also the "score" or confidence of that prediction

      for example in unlabeled_stream:

          sentence = [example["text"]] #for some reason the setfit model wants the text in an array
          predictions = self.custom_model(sentence) 
          score = torch.max(self.custom_model.predict_proba(sentence)) #grab the max probability in the tensor
          
          #trying to show all options in UI
          options = [{"id": label, "text": label} for label in self.labels]
          print('options', options)
          example["options"] = options
          yield (score.item(), example)

We also added the choice_style in the config in this part of the code shown below:

return {
    "view_id": "classification",  # Annotation interface to use
    "dataset": "custom_model_dataset",  # Name of dataset to save annotations
    "stream": model_predicted_stream,  # Incoming stream of examples
    "update": update,  # Update callback, called with batch of answers
    "config":  {
        "choice_style": "single",
        "labels": label,  # the labels for the manual interface
    }
}

Thanks!

koaning · October 17, 2023, 1:00pm

You can choose to use the model to filter the stream of examples so that you may prioritize which examples to annotate first without passing the predictions into the example. That way, you may still choose to prefer specific cases (maybe you've got a rare case you'd like to get more examples for) without leaking any information to the annotator.

Your example looks about right, I think you're interested in using the choice view_id instead of the classification one.

The choice interface allows you to select from a set of 3 choices. Have you looked into that interface? If so, is there a reason why it may not work for you?

Topic		Replies	Views
Custom model Requirements usage , custom	8	2919	March 25, 2019
Custom multilabel categorization recipe textcat , spacy , front-end , solved	12	6278	August 3, 2020
From Choice annotations to binary annotations with Teach usage , textcat , spacy	4	988	January 2, 2019
Active learning for a multilabel text classifer textcat	1	1126	December 14, 2017
Multi-label text classification with many labels usage , textcat	7	2415	June 30, 2020

Using active learning on a multiclass problem

Related topics