Prodigy Active Learning prefer_uncertain mechanism

honnibal · November 3, 2020, 10:55am

The sorting mechanism is designed to account for large input data sets, so it works in a streaming fashion, sorting chunks of data. It's a tricky balancing act because we don't want to block the annotation feed while we search for the best examples.

The standard deviation we refer to is a running estimate of the standard deviation. This is an approximation, the calculation is similar to making an exponential moving average estimation. It's the standard deviation of the specified "figure of merit". In prefer_high_scoring it will be the scores, in prefer_uncertain it'll be the uncertainty.

The algorithm='probability' does not try to adjust for the scale of the scores, instead trusting them directly.

Let's say you're using prefer_high_scores and all the scores are mostly coming in around 0.9. Under algorithm='probability', each of these examples will have a have a high chance of being emitted. But under the moving average algorithm, most of these examples will be filtered out, and the sorter will look for examples that score higher than average.

So if your model is already producing reasonably well calibrated estimates, you might want to use the algorithm='probability'. But if you don't trust the scores directly, or you worry that as you click your model might get 'stuck' with bad weights, the moving average sorter can be better.

Under imbalanced classes, you can generally expect that the scores for the positive class will stay low, so uncertainty and probability will be similar.

Topic		Replies	Views
prefer_uncertain: how does it use the stream to pick examples to score? usage , api	3	1405	December 12, 2017
active learning covering all candidates custom	4	318	October 4, 2022
Scoring and sorting all samples during textcat teach usage , textcat	2	535	November 2, 2020
Including own active learning function / active learning outputs usage , api , custom	6	738	May 10, 2019
Batch size usage , solved , streams	5	1672	October 29, 2020

Prodigy Active Learning prefer_uncertain mechanism

Related topics