Meaning of reject in textcat.manual to textcat.batch-train

How does textcat.batch-train interpret selected labels in an example that where choices were selected and the example was annotated as “reject” in textcat.manual? Are the selected examples treated as “negative” examples? I’m a little confused as the key for the labels in the annotation data is assigned to the key “accept”.

For example, after putting some examples through textcat.manual, I can have the following json for an accepted example

{
  "text": "Here's some stuff",
  "_input_hash": -1534612082,
  "_task_hash": 1619050212,
  "options": [
    {
      "id": "FEVER",
      "text": "FEVER"
    },
    {
      "id": "URINARY_FREQUENCY",
      "text": "URINARY_FREQUENCY"
    },
    {
      "id": "DYSURIA",
      "text": "DYSURIA"
    }
  ],
  "_session_id": "CHOICE_TEST-default",
  "_view_id": "choice",
  "accept": ["FEVER", "DYSURIA"],
  "answer": "accept"
}

and the following json for a rejected example

{
  "text": "Here's some other stuff",
  "_input_hash": -1534612083,
  "_task_hash": 1619050212,
  "options": [
    {
      "id": "FEVER",
      "text": "FEVER"
    },
    {
      "id": "URINARY_FREQUENCY",
      "text": "URINARY_FREQUENCY"
    },
    {
      "id": "DYSURIA",
      "text": "DYSURIA"
    }
  ],
  "_session_id": "CHOICE_TEST-default",
  "_view_id": "choice",
  "accept": ["FEVER", "DYSURIA"],
  "answer": "reject"
}

Are FEVER and DYSURIA positive examples in the first and negative examples in the second, and URINARY_FREQUENCY ignored in both?

For context, the reason I’m curious is that “rejecting” a label and “accepting” it are both very important in my use case.

1 Like

We’ve just pushed a version 1.8.1 that corrects an issue with the "reject" examples for textcat.manual. It also corrects the other bug you found, with textcat.batch-train. We now interpret “reject” as inverting the values for the selection. Let’s say you have three labels: A, B and C. If you select B and click “reject”, that will be the same as selecting A and C and clicking Accept.

This is mostly there for consistency with the other recipes, and I guess to match what an annotator might mean by clicking “Reject” like that. It could also be quicker if you know each example has one false label, as this way you can use multi-select. I would say it’s probably less confusing to stick to Accept in most situations, though.

Great! Thanks for the explanation.

I think this means I will need to add a separate labeling task for rejecting labels. In my case, rejecting one label is not the same as accepting the others (e.g., just because we know someone is denying one thing, does not mean they are claiming the other options).

You should be able to do this with a post-processing operation on the dataset, before you run the textcat.batch-train.

A bit of background about the semantics of the cats gold-standard dictionary in spaCy: spaCy supports missing gold-standard values. If you don’t know whether some label applies to the text, you can make sure it’s missing from the cats dictionary. spaCy will then avoid updating that category either way.

So I think you want labelling category A and clicking Reject to produce a cats dict like {"A": 0.0}. This will avoid updating labels B and C, and just make sure the label A doesn’t apply. If you pass in a task object like {"text": "...", "cats": {"A": 0.0}, "answer": "accept"}, you should get these semantics. You should do this transformation yourself though, as a post-process after the textcat.manual recipe. This will prevent Prodigy from making its own interpretation of what “reject” means, which doesn’t suit your intention.

Thanks for that! This is absolutely what I need to do.