textcat.batch-train reject examples

MBSanchez · September 26, 2019, 9:07pm

Hi!

I have a question regarding creating reject examples for using the textcat.batch-train recipe.

I have examples of positive (accept) labels but I am creating artificially the negative (reject) examples, based on the positive examples. For example:

I have these positives examples:

Health 11065
PhysicsSci 3833
Technology 3449
Environment 3139
Energy 3000
Biology 2324
Transport 1776
Agriculture 275
Space 33
Biotechnology 13

In order to create reject examples for Energy, for example, I may get all the other categories and use those as reject examples for Energy. The problem here is that there may be texts that could be multi-labelled with Energy and Environment and I am confusing the model saying that all Environment texts are not Energy texts.

What could be the best strategy for creating reject examples?

Thanks!

ines · September 29, 2019, 11:39am

Hi! If you don't know whether a category applies or not, it's also difficult to automatically create labelled examples for it (without actually labelling at least some of the data). If your categories are actually mutually inclusive but you only have exclusive labels, it might be easiest to just select a subset of the annotations and re-label them with the remaining categories (e.g. using textcat.manual).

The latest version of Prodigy also supports training directly from the choice format (i.e. with a key "accept": [] mapped to a list of labels). So you don't necessarily need to create separate rejected examples anymore – you just want to make sure that you also provide examples where multiple labels apply to a text, so the model can learn that.

Topic		Replies	Views
Meaning of reject in textcat.manual to textcat.batch-train usage , textcat , done	4	930	May 22, 2019
Practical use of rejected textcat.teach annotations for downstream tasks	2	89	May 24, 2024
"prodigy train textcat ... " doesn't discard reject/ignore examples textcat , done	4	571	February 21, 2020
Making the right selection for multi-label text categorization usage , textcat	1	389	December 7, 2021
Train doesn't use rejected text for binary classification textcat , done	3	441	March 17, 2020

textcat.batch-train reject examples

Related topics