Efficient binary annotation using textcat.teach

fhamborg · December 19, 2019, 3:51pm

Hi, I'm using the textcat.teach recipe for some straight forward annotation of only two labels: biased-language and neutral-language, like shown in the screenshot.

I started prodigy using: prodigy textcat.teach wordchoice en_core_web_lg prod.jsonl --label NEUTRAL-LANG,BIASED-LANG

As far as I understand the documentation, in textcat.teach prodigy uses the model in the loop to already choose one label, which will be shown at the top (e.g., in the screenshot, prodigy predicted neutral-lang). Further, human annotators have to accept the task if the shown label is correct in their opinion, and have to reject if the shown label is incorrect.

I got two questions:

Is that correct?
In terms of manual labeling, I find that considering both the predicted label as well as the text in order to determine which button to press (accept, if the label matches the text, or else reject) puts a higher cognitive load than, for instance, always pressing accept for one label (e.g., biased-language) and always reject for the other (e.g., neutral-language). How would I do that (e.g., accept=biased-language, reject=neutral-language) and still use textcat.teach including its active learning part?

Cheers,
Felix

honnibal · December 19, 2019, 5:01pm

Hi,

If your two classes are mutually exclusive (a text can be either biased, or it can be neutral), can you just use one label? You could even get by with positive/neutral/negative if you use ignore for neutral, although that might not be advisable (as then you don't have a way to signal ignore).

fhamborg · December 19, 2019, 7:31pm

Yes, they are mutually exclusive. So, I would call prodigy using:
prodigy textcat.teach wordchoice en_core_web_lg prod.jsonl --label BIASED-LANG
... and during annotation press accept if the text is biased, and reject if it is not?

I'm just wondering, since prodigy will always predict BIASED-LANG (because there are no other labels it could choose from), how does it learn what to predict?

ines · December 20, 2019, 5:05pm

spaCy's implementation for text classification will learn to predict a score between 0.0 and 1.0 for each given label. So having only one label shouldn't be a problem. If Prodigy suggests a text for BIASED-LANG and it's not biased, you hit reject and the feedback the model gets is "this should be 0.0". If you accept, the feedback is "this should be 1.0".

Topic		Replies	Views
From Choice annotations to binary annotations with Teach usage , textcat , spacy	4	1002	January 2, 2019
Train a textcat model after it has been 'prodigy.teach'ed with 3 labels usage , textcat	5	597	November 16, 2020
textcat vs textcat_multilabel usage , textcat , training	12	3292	September 13, 2023
textcat.teach with multiple choice interface? usage , textcat	9	1378	November 3, 2020
textcat.teach for multi-class classification textcat	3	529	June 19, 2023

Efficient binary annotation using textcat.teach

Related topics