Train doesn't use rejected text for binary classification

curious · March 17, 2020, 2:10pm

Maybe I still didn't get the Accept and Reject concept.

I have a binary classification problem. I used the Match recipe to annotation text with label "x". In the result, all the text have the label "x". Some of them have "answer" "accept" and other have "reject". My expectation is that both "accept" and "reject" text will be used for training. "accept" will be the positive cases and "reject" will be the negative cases. However when I use the "train" recipe, only the "accept" cases were used in the training. My question is

Does it work as the design?
How can I convert the "reject" cases to negative cases?

Thank you,

ines · March 17, 2020, 2:28pm

Thanks for the report! This looks like a regression that was introduced in the latest v1.9.8, which causes the binary annotations to be filtered incorrectly in the general train recipe. Sorry about that! (The fix was actually supposed to correct a problem that could lead to incorrect totals being reported.)

You can work around this by editing the train.py file and adding accept_only=False to the
example loading for text classification:

textcat_examples = load_examples(DB, textcat_datasets, accept_only=False)

I'm just building the wheels for a small update that fixes this internally and improves a few other related things around how the examples are filtered.

ines · March 17, 2020, 2:53pm

Also just released v1.9.9, which should solve the underlying problem and handle different annotation types (and the meanings of accept and reject depending on the data type) correctly

curious · March 17, 2020, 2:59pm

Yes, your solution works! thanks.

Topic		Replies	Views
"prodigy train textcat ... " doesn't discard reject/ignore examples textcat , done	4	571	February 21, 2020
Meaning of reject in textcat.manual to textcat.batch-train usage , textcat , done	4	930	May 22, 2019
textcat.batch-train reject examples usage , textcat	1	400	September 29, 2019
Practical use of rejected textcat.teach annotations for downstream tasks	2	89	May 24, 2024
Problem with annotation usage , textcat , solved	5	726	June 2, 2020

Train doesn't use rejected text for binary classification

Related topics