Scoring and sorting all samples during textcat teach

atakanokan · October 30, 2020, 8:51pm

Hi,

I have been trying to understand how the active-learning is working under a teach recipe, specifically for the text classification case: textcat.teach. Couple of questions around it:

Does the line stream = prefer_uncertain(model(stream)) (located at https://github.com/explosion/prodigy-recipes/blob/0037b32d954e0b1672f9dae1e8aa53ac0c9136e3/textcat/textcat_custom_model.py#L63) score and resort ALL samples in an input file (e.g. JSONL)? Or does it score and resort only batch_size number of samples from the already annotated samples?
For a highly imbalanced dataset (major class being 0 in a binary classification task), is it better to use prefer_high_scores instead of prefer_uncertain to construct a more balanced dataset?

Thanks!

ines · November 2, 2020, 9:16am

A post was merged into an existing topic: Prodigy Active Learning prefer_uncertain mechanism

Topic		Replies	Views
textcat.teach not using active learning textcat , solved	9	1396	April 10, 2018
textcat.teach to show all the docs in stream, despite their score textcat , spacy	5	578	August 7, 2018
textcat.batch-train usage , textcat	3	1264	August 29, 2018
textcat.teach stream of data is linear ? textcat	2	330	March 14, 2023
Prodigy Active Learning prefer_uncertain mechanism usage , custom , pytorch	8	1851	November 23, 2020

Scoring and sorting all samples during textcat teach

Related topics