Multi-label Text Classification Ignore example

zazos · October 27, 2020, 10:24am

I was wondering, while i'm teaching my model, separately for each class/label, how should i ignore examples?

For instance, if an example is pretty ambiguous for the label A, but it's pretty relevant for label B (which will be taught in the next iteration), should i reject the example or ignore it? I was following the rejecting method, rather than ignoring. I would ignore an example if i knew it wasn't relevant at all to my classification (didn't classify as any of my labels) or/and had bad markups, broken links, different languages etc.

What happens when you ignore an example on multi-label text classification? Is it excluded from being taught for the next class as well?
Sorry for these multiple questions, i was trying to be as clear as possible!

ines · October 28, 2020, 10:02am

Hi! If you're annotating data for text classification, a "reject" will be interpreted as "these labels don't apply", so it's probably not what you want in your use case. If you hit "ignore", the example will be skipped and excluded from training.

If you're collecting binary annotations, it's easier to collect more fine-grained feedback for text + label combinations, because you can ignore text + label A, and accept text + label B separately, and during training, the model won't be updated with information about text + label B. But that's more difficult if you're annotating all at once. I think in that case, it might be safest to just ignore the whole example and move on.

Typically, a single example doesn't matter that much on the scale of things and it's much more important to move through your examples and collect more annotations, rather than spending too much time on a single, slightly ambiguous decision.

zazos · October 29, 2020, 7:20am

Hi @ines and thanks for you feedback, really appreciate it!
To be more clear, i was following the binary-annotations strategy for my multi label text classification.
I create 5 different terms for 5 different vocabularies i have, with textcat.teach.
Then i teach them to my temporary model, separately for each class.
And here's where lies the question i made above.
I understand that, for example, rejecting an example, while teaching for X label, means that label X don't apply to that example. But when i hit ignore on this example, would that example be excluded from label Y (Z, W or any other label of my classification), which i will be teaching next?
Thanks again!!

ines · October 29, 2020, 8:35am

When the model is updated in the loop, the example won't be used when you hit "ignore", so the model won't get any feedback for label Y for that example. When you train the model later and merge the collected annotations, the information for that label for that given example will be considered unknown, but other labels that you did collect annotations on can still be used for the same example. spaCy supports updating with incomplete annotations, with the missing value typically annotated as None or - under the hood.

Let's say you're annotating the same text multiple times with multiple labels and you select: X: accept, Y: reject, Z: accept. When the data is merged, this information becomes:

cats = {"X": 1.0, "Y": 0.0, "Z": 1.0}

Now if you select X: accept, Y: ignore, Z: reject, the result is:

cats = {"X": 1.0, "Y": None, "Z": 0.0}

zazos · October 29, 2020, 9:05am

Let's say you're annotating the same text multiple times with multiple labels and you select: X: accept, Y: reject, Z: accept. When the data is merged, this information becomes:

cats = {"X": 1.0, "Y": 0.0, "Z": 1.0}

Now if you select X: accept, Y: ignore, Z: reject, the result is:

cats = {"X": 1.0, "Y": None, "Z": 0.0}

Excellent example, as detailed as i needed. Thank you very much!

Topic		Replies	Views
Making the right selection for multi-label text categorization usage , textcat	1	389	December 7, 2021
textcat_multilabel with only some labels annotated for some examples	5	377	June 14, 2022
Reject or skip examples for text classifier annotations usage , textcat	2	888	November 29, 2018
Ignored sentences for text classification usage , textcat	10	1937	March 3, 2020
"prodigy train textcat ... " doesn't discard reject/ignore examples textcat , done	4	571	February 21, 2020

Multi-label Text Classification Ignore example

Related topics