If you're using textcat.teach
with patterns, you're essentially using two models: the text classifier, which will suggest examples based on the model's predictions, using the label(s) specified on the command line, and the pattern matcher, which will suggest examples based on our patterns file.
So in your case, you've told the text classifier that you want to annotate LABEL_B
, but it knows nothing about it yet, so the suggestions you're seeing are all based on your patterns, which describe LABEL_A
. Normally, the text classifier would eventually "kick in" and suggest examples for the label(s) as well – but this won't happen here, since you're only annotating LABEL_A
and it never learns anything about LABEL_B
.
Btw, to tell where the examples you're annotating are comIng from, you can check out the bottom right corner of the annotation card. You'll either see a score (predicted by the model) or a pattern number (corresponding to the current patterns file), indicating which pattern was used to produce this match.
To answer your questions more explicitly:
Your annotations aren't "wrong" – but they're also not as useful as they could be, because you've only annotated pattern matches for LABEL_A
and didn't really get to work with the model. So I'd definitely suggest rerunning textcat.teach
. Maybe start with a new dataset, so you can run separate experiments later on: one with only the new set, and one with the new set and old set combined (to see if it makes a difference).
The --label
on the command line won't override anything in the patterns. The patterns are just a list of examples for the individual labels, to tell Prodigy "If you come across a match for this, it might be label X".
Prodigy could also handle cases like this better. Since we're parsing all patterns upfront anyways, the pattern matcher could at least warn you if one or more of the input labels aren't present in your patterns. This would also let us filter the patterns by --labels
by default.