The sorting mechanism is designed to account for large input data sets, so it works in a streaming fashion, sorting chunks of data. It's a tricky balancing act because we don't want to block the annotation feed while we search for the best examples.
The standard deviation we refer to is a running estimate of the standard deviation. This is an approximation, the calculation is similar to making an exponential moving average estimation. It's the standard deviation of the specified "figure of merit". In prefer_high_scoring
it will be the scores, in prefer_uncertain
it'll be the uncertainty.
The algorithm='probability'
does not try to adjust for the scale of the scores, instead trusting them directly.
Let's say you're using prefer_high_scores
and all the scores are mostly coming in around 0.9. Under algorithm='probability'
, each of these examples will have a have a high chance of being emitted. But under the moving average algorithm, most of these examples will be filtered out, and the sorter will look for examples that score higher than average.
So if your model is already producing reasonably well calibrated estimates, you might want to use the algorithm='probability'
. But if you don't trust the scores directly, or you worry that as you click your model might get 'stuck' with bad weights, the moving average sorter can be better.
Under imbalanced classes, you can generally expect that the scores for the positive class will stay low, so uncertainty and probability will be similar.