Hi!
I have a question regarding creating reject examples for using the textcat.batch-train
recipe.
I have examples of positive (accept
) labels but I am creating artificially the negative (reject
) examples, based on the positive examples. For example:
I have these positives examples:
Health 11065
PhysicsSci 3833
Technology 3449
Environment 3139
Energy 3000
Biology 2324
Transport 1776
Agriculture 275
Space 33
Biotechnology 13
In order to create reject examples for Energy, for example, I may get all the other categories and use those as reject examples for Energy. The problem here is that there may be texts that could be multi-labelled with Energy and Environment and I am confusing the model saying that all Environment texts are not Energy texts.
What could be the best strategy for creating reject examples?
Thanks!