I was wondering, while i'm teaching my model, separately for each class/label, how should i ignore examples?
For instance, if an example is pretty ambiguous for the label A, but it's pretty relevant for label B (which will be taught in the next iteration), should i reject the example or ignore it? I was following the rejecting method, rather than ignoring. I would ignore an example if i knew it wasn't relevant at all to my classification (didn't classify as any of my labels) or/and had bad markups, broken links, different languages etc.
What happens when you ignore an example on multi-label text classification? Is it excluded from being taught for the next class as well?
Sorry for these multiple questions, i was trying to be as clear as possible!
Hi! If you're annotating data for text classification, a "reject" will be interpreted as "these labels don't apply", so it's probably not what you want in your use case. If you hit "ignore", the example will be skipped and excluded from training.
If you're collecting binary annotations, it's easier to collect more fine-grained feedback for text + label combinations, because you can ignore text + label A, and accept text + label B separately, and during training, the model won't be updated with information about text + label B. But that's more difficult if you're annotating all at once. I think in that case, it might be safest to just ignore the whole example and move on.
Typically, a single example doesn't matter that much on the scale of things and it's much more important to move through your examples and collect more annotations, rather than spending too much time on a single, slightly ambiguous decision.
Hi @ines and thanks for you feedback, really appreciate it!
To be more clear, i was following the binary-annotations strategy for my multi label text classification.
I create 5 different terms for 5 different vocabularies i have, with textcat.teach.
Then i teach them to my temporary model, separately for each class.
And here's where lies the question i made above.
I understand that, for example, rejecting an example, while teaching for X label, means that label X don't apply to that example. But when i hit ignore on this example, would that example be excluded from label Y (Z, W or any other label of my classification), which i will be teaching next?
Thanks again!!
When the model is updated in the loop, the example won't be used when you hit "ignore", so the model won't get any feedback for label Y for that example. When you train the model later and merge the collected annotations, the information for that label for that given example will be considered unknown, but other labels that you did collect annotations on can still be used for the same example. spaCy supports updating with incomplete annotations, with the missing value typically annotated as None or - under the hood.
Let's say you're annotating the same text multiple times with multiple labels and you select: X: accept, Y: reject, Z: accept. When the data is merged, this information becomes:
cats = {"X": 1.0, "Y": 0.0, "Z": 1.0}
Now if you select X: accept, Y: ignore, Z: reject, the result is:
Let's say you're annotating the same text multiple times with multiple labels and you select: X: accept, Y: reject, Z: accept. When the data is merged, this information becomes:
cats = {"X": 1.0, "Y": 0.0, "Z": 1.0}
Now if you select X: accept, Y: ignore, Z: reject, the result is:
cats = {"X": 1.0, "Y": None, "Z": 0.0}
Excellent example, as detailed as i needed. Thank you very much!