Meaning of reject in textcat.manual to textcat.batch-train

thejamesmarq · May 20, 2019, 11:21pm

How does textcat.batch-train interpret selected labels in an example that where choices were selected and the example was annotated as “reject” in textcat.manual? Are the selected examples treated as “negative” examples? I’m a little confused as the key for the labels in the annotation data is assigned to the key “accept”.

For example, after putting some examples through textcat.manual, I can have the following json for an accepted example

{
  "text": "Here's some stuff",
  "_input_hash": -1534612082,
  "_task_hash": 1619050212,
  "options": [
    {
      "id": "FEVER",
      "text": "FEVER"
    },
    {
      "id": "URINARY_FREQUENCY",
      "text": "URINARY_FREQUENCY"
    },
    {
      "id": "DYSURIA",
      "text": "DYSURIA"
    }
  ],
  "_session_id": "CHOICE_TEST-default",
  "_view_id": "choice",
  "accept": ["FEVER", "DYSURIA"],
  "answer": "accept"
}

and the following json for a rejected example

{
  "text": "Here's some other stuff",
  "_input_hash": -1534612083,
  "_task_hash": 1619050212,
  "options": [
    {
      "id": "FEVER",
      "text": "FEVER"
    },
    {
      "id": "URINARY_FREQUENCY",
      "text": "URINARY_FREQUENCY"
    },
    {
      "id": "DYSURIA",
      "text": "DYSURIA"
    }
  ],
  "_session_id": "CHOICE_TEST-default",
  "_view_id": "choice",
  "accept": ["FEVER", "DYSURIA"],
  "answer": "reject"
}

Are FEVER and DYSURIA positive examples in the first and negative examples in the second, and URINARY_FREQUENCY ignored in both?

For context, the reason I’m curious is that “rejecting” a label and “accepting” it are both very important in my use case.

honnibal · May 21, 2019, 5:32pm

We’ve just pushed a version 1.8.1 that corrects an issue with the "reject" examples for textcat.manual. It also corrects the other bug you found, with textcat.batch-train. We now interpret “reject” as inverting the values for the selection. Let’s say you have three labels: A, B and C. If you select B and click “reject”, that will be the same as selecting A and C and clicking Accept.

This is mostly there for consistency with the other recipes, and I guess to match what an annotator might mean by clicking “Reject” like that. It could also be quicker if you know each example has one false label, as this way you can use multi-select. I would say it’s probably less confusing to stick to Accept in most situations, though.

thejamesmarq · May 21, 2019, 6:54pm

Great! Thanks for the explanation.

I think this means I will need to add a separate labeling task for rejecting labels. In my case, rejecting one label is not the same as accepting the others (e.g., just because we know someone is denying one thing, does not mean they are claiming the other options).

honnibal · May 22, 2019, 12:41pm

You should be able to do this with a post-processing operation on the dataset, before you run the textcat.batch-train.

A bit of background about the semantics of the cats gold-standard dictionary in spaCy: spaCy supports missing gold-standard values. If you don’t know whether some label applies to the text, you can make sure it’s missing from the cats dictionary. spaCy will then avoid updating that category either way.

So I think you want labelling category A and clicking Reject to produce a cats dict like {"A": 0.0}. This will avoid updating labels B and C, and just make sure the label A doesn’t apply. If you pass in a task object like {"text": "...", "cats": {"A": 0.0}, "answer": "accept"}, you should get these semantics. You should do this transformation yourself though, as a post-process after the textcat.manual recipe. This will prevent Prodigy from making its own interpretation of what “reject” means, which doesn’t suit your intention.

thejamesmarq · May 22, 2019, 10:37pm

Thanks for that! This is absolutely what I need to do.

Topic		Replies	Views
textcat.batch-train reject examples usage , textcat	1	400	September 29, 2019
Making the right selection for multi-label text categorization usage , textcat	1	389	December 7, 2021
Are 'Reject' examples included in textcat_multilabel train/train-curve?	5	248	November 19, 2022
Train doesn't use rejected text for binary classification textcat , done	3	441	March 17, 2020
Converting choice annotations to textcat annotations usage , textcat , custom , solved	6	1418	September 5, 2018

Meaning of reject in textcat.manual to textcat.batch-train

Related topics