textcat.correct is always annotating exclusive categories

Hi there,

I'm encountering a bug with Prodigy (version 1.14.9) regarding the textcat.correct recipe which is always starting with the following message ":information_source: Annotating exclusive categories based on 'textcat_multilabel'" when I use this command: prodigy textcat.correct <dataset> <model_path> <source>

The issue arises despite the following model configuration:

...

[components]

[components.textcat_multilabel]
factory = "textcat_multilabel"
scorer = {"@scorers":"spacy.textcat_multilabel_scorer.v2"}
threshold = 0.5

[components.textcat_multilabel.model]
@architectures = "spacy.TextCatBOW.v2"
exclusive_classes = false
ngram_size = 1
no_output_layer = false
nO = null

...

The exclusive_classes property is set to False which regarding to your code should not trigger the annotations as exclusive.

Additionally, I noticed that this bug was fixed in 1.14.7, yet I am encountering it in version 1.14.9.

Thanks for your help :slight_smile:

Hi @TimothePearce ,

Thanks for reporting the issue. It helped me realize that our 1.14.7 fix was addressing just a subset of possible spaCy textcat architectures, concretely spacy.TextCatEnsemble.v2, which is the default architecture.
We should be able to release the patch that covers other architectures very soon.

Hi @magdaaniol,

Thank you for your prompt response. Actually, spacy.TextCatBOW.v2 is the default architecture when using the command prodigy train -tmc <dataset>. This might be another issue worth reporting. :smile:

Hi @TimothePearce ,

Just wanted to let you know that the bug related to exclusive labels is fixed in Prodigy 1.14.10 released yesterday.

spacy.TextCatBOW.v2 is the default architecture when using the command prodigy train -tmc <dataset>. This might be another issue worth reporting.

Actually, in spaCy there are different defaults for the textcat architecture when using the textcat factory ( e.g. when calling spaCy add_pipe) and when generating the config from the template using init config.

The main difference being that init configcan configure a whole pipeline at once so e.g. it tries to use a shared tok2vecand the textcat default spacy.TextCatBOW.v2, while add_pipedefaults have to work in isolation, so they don't use shared tok2vec and the textcat default spacy.TextCatEnsemble.v2.

Prodigy train uses init_config under to hood, which is why the textcat default you observe is spacy.TextCatBOW.v2

You can of course modify the default by passing the config file as argument to train.
I admit my previous statement about defaults was incomplete - sorry for the confusion!

1 Like

Hi @magdaaniol, thanks for the fix released and your explanation regarding the differences with Spacy init config CLI.