Bug in textcat recipes when using blank textcat component in the loop

einarbmag · June 20, 2022, 11:33am

Hi,
we have built a workflow where we use custom spacy model configurations to initialise models from the very start of an annotation project (for example to include custom tokenisation). We are initialising the models with blank textcat components if it is a textcat annotation, and we are running into a problem.

If you use textcat.correct with 2 exclusive classes and a blank model in the loop, the model suggests both labels with 0.5 score. While the UI will only display one radio button selected, the getChoices function in the React code will in fact return both labels, so if the user doesn't make any change and just clicks accept, the saved annotation will have both labels. This of course causes errors down the line when trying to train a model.

Currently we are using a workaround of setting the threshold to 0.51, but I wanted to report it anyway as it's clearly a bug, even if it's perhaps a relatively unlikely edge case for most users.

koaning · June 21, 2022, 6:41am

I tried reproducing this but I seem to be getting a different view.

I used this data.csv file to train a model:

text,label
set a 4 minute timer,foo
set a 2 minute timer,bar
set a 3 minute timer,foo
set a 5 minute timer,bar
set a 1 minute timer,foo

I then proceeded to annotate, alternating between the "Foo" and "Bar" label.

python -m prodigy textcat.manual issue5716 issue-5716/data.csv --exclusive --label foo,bar

Next, I trained a textcat model, with exclusive classes.

python -m prodigy train --textcat issue5716 model-out

When I now run textcat.correct with another dataset I see this interface:

python -m prodigy textcat.correct issue5716 model-out/model-best progress/clinc.csv

I only hit "accept" on everything. Here's what the final few row in db-out looks like:

{
  "text":"how do you say please in french",
  "label":"translate",
  "_input_hash":634661088,
  "_task_hash":-883214818,
  "options":[{"id":"foo","text":"foo","meta":"0.50"},{"id":"bar","text":"bar","meta":"0.50"}],
  # Notice, only one item in `accept`! 
  "accept":["bar"],
  "_view_id":"choice",
  "config":{"choice_style":"single"},
  "answer":"accept",
  "_timestamp":1655793495
}

So I don't see both labels as a saved annotation

There might be something else happening though. Under the hood, the textcat.correct recipe is using choice as a view_id. That means that you should be able to add some extra settings in your prodigy.json file to make it behave differently. Just to double-check, do you have such a settings file around that might be causing this behaviour?

Topic		Replies	Views
textcat.correct is always annotating exclusive categories bug , solved	4	212	November 18, 2023
Textcat possible problem with uneven dataset? usage , textcat , done	2	956	January 17, 2020
textcat.teach not taking into account label value textcat , done	4	601	December 7, 2018
textcat_multilabel with only some labels annotated for some examples	5	376	June 14, 2022
mutually exclusive classes and textcat.batch-train usage , textcat	5	727	July 1, 2019

Bug in textcat recipes when using blank textcat component in the loop

Related topics