Is bath training on labels mutually exclusive?

bigbeaker · April 15, 2019, 7:59am

I have a dataset with multiple labels which I have annotated.
When I run textcat.batch train - would I expect the performance of my labels to be the same as training on a data set with just a single label?

For example
dataset A 2 labels: HOTDOG & NOTHOTDOG
dataset B: 1 label HOTDOG
dataset C 1 label NOTHOTDOG

would running:
textcat.batchtrain model datasetA have the same performance as training 2 separate models on datasets B & C and combining their outputs?

For separate textcat.batch labels is each label trained separately? or is there any ‘leak’

Currently working on comparing these empirically - but some insight and tips would be great

ines · April 15, 2019, 8:41am

Hi! The built-in textcat recipes use spaCy’s text classifier implementation, which currently expects the labels to be not mutually exclusive. So in theory, an example could be both hotdog and not hotdog. Of course, for a binary classification task like your example, this should be easy to work around by annotating and training only one label, HOTDOG.

spaCy v2.1 introduced the option to make the labels mutually exclusive – so in the next update of Prodigy, you’ll be able to specify this when you annotate and train a model. Depending on what you want to do, you might also find that a different text classification implementation just works better on your problem. In that case, you can export the data from Prodigy and train your model separately, or plug it in via a custom recipe to annotate with a model in the loop. Just make sure your model implementation is sensitive enough to updates.

bigbeaker · April 15, 2019, 8:45am

pgy textcat.batch-train HOTDOG models --label 'POS'

Okay great thanks for clarifying
Was just going through the docs again, does adding the label flag to batch-train override this behaviour in the current version?

Like so:
pgy textcat.batch-train data models --label 'label'

gladiator · July 4, 2019, 4:41pm

Hi, Do you have an update or an expected date when the mutually exclusive option will be available in Prodigy?

ines · July 7, 2019, 6:29pm

That’s already been released a while ago as part of Prodigy v1.8.x

gladiator · July 9, 2019, 2:17pm

I do have v1.8.2 but I dont see any option to make labels mutually exclusive in textcat.teach, could you point me towards the latest documentation ?

Topic		Replies	Views
textcat vs textcat_multilabel usage , textcat , training	12	3271	September 13, 2023
mutually exclusive classes and textcat.batch-train usage , textcat	5	727	July 1, 2019
Train a textcat model after it has been 'prodigy.teach'ed with 3 labels usage , textcat	5	575	November 16, 2020
Textcat possible problem with uneven dataset? usage , textcat , done	2	956	January 17, 2020
text classification: binary v. mutually exclusive labels usage , textcat , solved	1	708	March 1, 2022

Is bath training on labels mutually exclusive?

Related topics