Multilabel textcat dependecy between labels

nix411 · January 28, 2022, 12:55pm

I've had a pretty good textcat model with just a single label EARNINGS. My annotated data looks like this

[
  {
    ...
    "anwer": "accept",
    "label": "EARNINGS"
  },
  {
    ...
    "anwer": "reject",
    "label": "EARNINGS"
  },
]

Using prodigy 1.10.8 I've simply trained the model with prodigy train textcat tags-earnings blank:en -o model and gotten great results.

Now I want to add more labels, allowing multiple labels on the same document. So I start labelling e.g. M&A with a binary label workflow. I get data similar to above into a new dataset tags-ma.

My question is if my EARNINGS scores will be affected in any way by extending the model to also give a M&A score? Can I expect the same performance on EARNINGS if I now run prodigy train textcat tags-earnings,tags-ma blank:en -o model?

If I understand this comment correctly then each label is completely independent, but I'd like to be sure.

Bonus question; does it take additional compute for each label when predicting the scores?

ines · January 31, 2022, 12:15pm

If your labels aren't mutually exclusive then yes. If your labels are mutually exclusive and you train a regular textcat component, you could in theory end up with ambiguous examples where the model trained on only one label would have predicted that label and now considers the second label more likely. But it sounds like that's not what you're trying to do.

You can profile it but there shouldn't be a noticable difference between one and two labels.

Topic		Replies	Views
textcat vs textcat_multilabel usage , textcat , training	12	3265	September 13, 2023
text classification: binary v. mutually exclusive labels usage , textcat , solved	1	708	March 1, 2022
textcat_multilabel with only some labels annotated for some examples	5	377	June 14, 2022
Train a textcat model after it has been 'prodigy.teach'ed with 3 labels usage , textcat	5	574	November 16, 2020
textcat training with only one label textcat	1	156	January 17, 2024

Multilabel textcat dependecy between labels

Related topics