Efficient binary annotation using textcat.teach

Hi, I'm using the textcat.teach recipe for some straight forward annotation of only two labels: biased-language and neutral-language, like shown in the screenshot.


I started prodigy using: prodigy textcat.teach wordchoice en_core_web_lg prod.jsonl --label NEUTRAL-LANG,BIASED-LANG

As far as I understand the documentation, in textcat.teach prodigy uses the model in the loop to already choose one label, which will be shown at the top (e.g., in the screenshot, prodigy predicted neutral-lang). Further, human annotators have to accept the task if the shown label is correct in their opinion, and have to reject if the shown label is incorrect.

I got two questions:

  1. Is that correct?
  2. In terms of manual labeling, I find that considering both the predicted label as well as the text in order to determine which button to press (accept, if the label matches the text, or else reject) puts a higher cognitive load than, for instance, always pressing accept for one label (e.g., biased-language) and always reject for the other (e.g., neutral-language). How would I do that (e.g., accept=biased-language, reject=neutral-language) and still use textcat.teach including its active learning part?

Cheers,
Felix

Hi,

If your two classes are mutually exclusive (a text can be either biased, or it can be neutral), can you just use one label? You could even get by with positive/neutral/negative if you use ignore for neutral, although that might not be advisable (as then you don't have a way to signal ignore).

Yes, they are mutually exclusive. So, I would call prodigy using:
prodigy textcat.teach wordchoice en_core_web_lg prod.jsonl --label BIASED-LANG
... and during annotation press accept if the text is biased, and reject if it is not?

I'm just wondering, since prodigy will always predict BIASED-LANG (because there are no other labels it could choose from), how does it learn what to predict?

spaCy's implementation for text classification will learn to predict a score between 0.0 and 1.0 for each given label. So having only one label shouldn't be a problem. If Prodigy suggests a text for BIASED-LANG and it's not biased, you hit reject and the feedback the model gets is "this should be 0.0". If you accept, the feedback is "this should be 1.0".