In following along with the insults classifier video, at 9:56 when you've just launched textcat and flip to the Prodigy annotation interface, it appears that you are annotating the entire example:
However, when I launch my textcat in the same way (albeit edited, to follow the new API as described in the YouTube comments):
prodigy textcat.teach my-new-dataset en_core_web_lg ./data/social_text_data_2.jsonl --label MY_LABEL --patterns ./data/my_seed_terms.jsonl
The first example is like so:
And the second example is like so:
My goal to classify the entire text, not just specific tokens or keyphrases, which doesn't seem to be what Prodigy is doing here (the highlighted words suggest that perhaps I'm labeling specific words? Or something?).
Additionally, for texts that contain many of my seed terms, this requires that I annotate each example multiple times.
If i exclude the
patterns argument, my interface looks like yours in the video, but it seems like it would be a shame to completely skip bootstrapping with "seeds". As an opinionated aside: I do like "seeds" much more than "patterns" for
textcat, as "patterns" seems more related to categorization of specific tokens, spans, or entities, while "seeds" seems more clearly to reference vectors used to classify entire docs.
What am I doing wrong here?