How textcat.teach works under the hood

Great! Thanks for sharing the cause of the problem!

In this case, does accepting mean that we are going to put this example in the new training set? If so would we have to go in after textcat.teach and assign the correct label to it?

When you hit accept, you accept the example with the label, so there's no need to relabel it. In the case of multilabel text classification, the model gets updated with the information for this particular label only. It doesn't update anythig about the remaining labels. The remaining labels will appear in separate binary decision task for this example.

Also, I recommend this post which explains a bit more about the effects of binary annotations on the dataset as well as how you can modify the selection of examples by choosing a different sorter (please note that this is an older post and it makes references to a deprecated batch-train recipe, but the general principle it describes is still valid). prefer_uncertain is an optimal choice for most active learning scenarios, but if you have a very high number of labels in multilabel scenario and you're seeing yourself clicking through mostly negative examples, you might be better off with collecting more positive examples at the beginning and switching to uncertainty sampling with a stronger baseline model.

If you'd like to apply a different scorer, you can modify the source code of the textcat.teach recipe available in your Prodigy installation path and change line 100 to:

stream.apply(lambda d: prefer_high_scores(predict(d)))

and also the import on line 22:

from ..components.sorters import prefer_uncertain, prefer_high_scores