From Choice annotations to binary annotations with Teach

The default text classification model (via spaCy) assumes that categories are not mutually exclusive – so if you update the model with a text plus a category, the update is only performed for that label and all other labels are treated as unknown / missing values. Prodigy uses the same approach for binary NER annotations btw – my slides here show an example of this process.

Yeah, this sounds reasonable. The uncertainty sampling is performed by the prefer_uncertain sorter, which takes a stream of (score, example) tuples and yields examples. Under the hood, it uses an exponential moving average to determine whether to send out an example or not. Instead of prefer_uncertain, you can also use the prefer_high_scores sorter, which has the same API, but prioritises high scores.

So in recipes/textcat.py, you could update the teach recipe like this:

from prodigy.components.sorters import prefer_high_scores

# in the recipe:
stream = prefer_high_scores(model(stream))

Our prodigy-recipes repo also has a simplified version of the textcat.teach recipe with a bunch of comments explaining what's going on. So you might find this useful as well as a starting point to write your own custom version: