text classification: binary v. mutually exclusive labels

I have annotated data with the binary interface in Prodigy 1.10,
python -m prodigy textcat.manual dset-01 ./data.txt --label LABEL
I then try to train it with the Prodigy 1.11,
python -m prodigy train --textcat dset-01
that results in the error

ValueError: [E867] The 'textcat' component requires at least two labels because it uses mutually exclusive classes where exactly one label is True for each doc. For binary classification tasks, you can use two labels with 'textcat' (LABEL / NOT_LABEL) or alternatively, you can use the 'textcat_multilabel' component with one label.

I may have missed something, but the valid command for binary annotation is now
python -m prodigy textcat.manual dset-01 ./data.txt --label LABEL,NO_LABEL --exclusive

That gives you the choice annotation interface (which is a lot clumsier than the binary interface, unless you hack the current interface). The new prodigy train --textcat also appears to be incompatible with binary annotations in Prodigy < 1.11)
Did I miss something, or do I have to hack the annotations?

old-style binary annotation
{"text":"Some text.", "_input_hash":407870737, "_task_hash":333329508, "label":"REVIEW", "_view_id":"classification", "answer":"reject","_timestamp":1646130322}

new-style "binary" binary
{"text":"some text", "_input_hash":1052042230, "_task_hash":-993239025, "options":[{"id":"LABEL","text":"LABEL"}, {"id":"NOLABEL","text":"NOLABEL"}], "_view_id":"choice", "config":{"choice_style":"single"}, "accept":["LABEL"],"answer":"accept","_timestamp":1646130757}

using Prodigy 1.11.7, spaCy 3.2

Hi! As the error message suggests, in spaCy v3, you should train the textcat_multilabel component if your data only contains one binary label and not multiple exclusive labels (which won't work if there's only one label). The train command supports the component via the --textcat-multilabel option that takes the datasets to train from: https://prodi.gy/docs/recipes#train So if you change --textcat to --textcat-multilabel, it should work as expected.

1 Like