I am doing a hierarchical text classification. There are two ways to deal with this:
- Multi-Label Classification: Can this be done in Prodigy + Spacy ?
- Levelwise models: Train models at a different level of the tree, but in this case we end up having so many models. Is this doable of the shelf in Spacy?
Sure! The documentation on text classification is a good place to start. There's also a section on using hierarchical label schemes here: https://prodi.gy/docs/text-classification#large-label-sets
You can definitely train multiple text classification components in spaCy and combine them into a single pipeline using different names. The components will all write their predicted scores to the
doc.cats property. So if your top-level text classifier predicts a high score for a label at the top level of your hierarchy, you can then look at the more fine grained labels predicted by
You will end up with multiple components this way, but not multiple pipelines, so the runtime pipeline can still be pretty lightweights and efficient. (In spaCy v3, you can also share the same embeddings with multiple components, so even if you're using large transformer embeddings, you'll only need to load them once for all text classifiers instead of once per classifier.)