Multilabel textcat dependecy between labels

I've had a pretty good textcat model with just a single label EARNINGS. My annotated data looks like this

[
  {
    ...
    "anwer": "accept",
    "label": "EARNINGS"
  },
  {
    ...
    "anwer": "reject",
    "label": "EARNINGS"
  },
]

Using prodigy 1.10.8 I've simply trained the model with prodigy train textcat tags-earnings blank:en -o model and gotten great results.

Now I want to add more labels, allowing multiple labels on the same document. So I start labelling e.g. M&A with a binary label workflow. I get data similar to above into a new dataset tags-ma.

My question is if my EARNINGS scores will be affected in any way by extending the model to also give a M&A score? Can I expect the same performance on EARNINGS if I now run prodigy train textcat tags-earnings,tags-ma blank:en -o model?

If I understand this comment correctly then each label is completely independent, but I'd like to be sure.

Bonus question; does it take additional compute for each label when predicting the scores?

If your labels aren't mutually exclusive then yes. If your labels are mutually exclusive and you train a regular textcat component, you could in theory end up with ambiguous examples where the model trained on only one label would have predicted that label and now considers the second label more likely. But it sounds like that's not what you're trying to do.

You can profile it but there shouldn't be a noticable difference between one and two labels.

1 Like