I have a dataset with multiple labels which I have annotated.
When I run textcat.batch train - would I expect the performance of my labels to be the same as training on a data set with just a single label?
dataset A 2 labels: HOTDOG & NOTHOTDOG
dataset B: 1 label HOTDOG
dataset C 1 label NOTHOTDOG
textcat.batchtrain model datasetA have the same performance as training 2 separate models on datasets B & C and combining their outputs?
For separate textcat.batch labels is each label trained separately? or is there any ‘leak’
Currently working on comparing these empirically - but some insight and tips would be great
Hi! The built-in
textcat recipes use spaCy’s text classifier implementation, which currently expects the labels to be not mutually exclusive. So in theory, an example could be both hotdog and not hotdog. Of course, for a binary classification task like your example, this should be easy to work around by annotating and training only one label,
spaCy v2.1 introduced the option to make the labels mutually exclusive – so in the next update of Prodigy, you’ll be able to specify this when you annotate and train a model. Depending on what you want to do, you might also find that a different text classification implementation just works better on your problem. In that case, you can export the data from Prodigy and train your model separately, or plug it in via a custom recipe to annotate with a model in the loop. Just make sure your model implementation is sensitive enough to updates.
pgy textcat.batch-train HOTDOG models --label 'POS'
Okay great thanks for clarifying
Was just going through the docs again, does adding the label flag to batch-train override this behaviour in the current version?
pgy textcat.batch-train data models --label 'label'
Hi, Do you have an update or an expected date when the mutually exclusive option will be available in Prodigy?
That’s already been released a while ago as part of Prodigy v1.8.x
I do have v1.8.2 but I dont see any option to make labels mutually exclusive in
textcat.teach, could you point me towards the latest documentation ?