Scoring and sorting all samples during textcat teach

Hi,

I have been trying to understand how the active-learning is working under a teach recipe, specifically for the text classification case: textcat.teach. Couple of questions around it:

  1. Does the line stream = prefer_uncertain(model(stream)) (located at https://github.com/explosion/prodigy-recipes/blob/0037b32d954e0b1672f9dae1e8aa53ac0c9136e3/textcat/textcat_custom_model.py#L63) score and resort ALL samples in an input file (e.g. JSONL)? Or does it score and resort only batch_size number of samples from the already annotated samples?
  2. For a highly imbalanced dataset (major class being 0 in a binary classification task), is it better to use prefer_high_scores instead of prefer_uncertain to construct a more balanced dataset?

Thanks!

A post was merged into an existing topic: Prodigy Active Learning prefer_uncertain mechanism