Welcome to the forum @samirmsallem 
First question: afaik there is no possibility to do this with ProdigyHF, correct? I only see recipes for NER and textcat but not both.
You're correct that there's no built-in recipe for multi-task (NER + textcat) training in ProdigyHF. The ProdigyHF recipes are built around the Hugging Face transformers
library's task-specific paradigm, where models are specialized for particular tasks:
hf.train.ner
uses AutoModelForTokenClassification
hf.train.textcat
uses AutoModelForSequenceClassification
These are different model architectures with task-specific prediction heads. The recipes are designed to be straightforward, single-purpose training scripts that align with Hugging Face's task-specific model classes.
transformers
does not offer a built-in architecture for multi-task learning as this requires a custom architecture with multiple task heads.
I would recommend you try with spacy-transformers
which uses multi-task learning by default (see docs on shared embedding layer).
Both NER and textcat components can be configured to use (and update) the same transformer embeddings via the TransformerListener
architecture, effectively implementing multi-task learning.
It's also very straightforward to export the data annotated with Prodigy to train a multi-component spaCy pipeline. data-to-spacy
command will take care of merging the textcat and NER annotations to create a single spaCy training example. spaCy will also take care of transformer-specific tokenization.
Second: Is it possible to export the dataset in a scheme that I can perform a multi task training outside of Prodigy ex. with the HuggingFace trainer? I used ner.manual and textcat.manual for annotating.
If however, you'd like train directly with transformers
, you can export the data with db-out
command to get in a straightforward JSONL format and convert it into any format you need using a custom script. You can also check the source code of ProdigyHF recipes to see how the conversion is done there.
Third: If no, would a training of the same model first on NER labels of the dataset and then on the textcat labels would bring the same effect like multi task fine tuning?
Multi-task fine-tuning differs from sequential single-task fine-tuning in several important ways:
In multi-task fine-tuning, a model is trained simultaneously on multiple tasks, with a shared parameter space. This means:
- The model learns to optimize for all tasks at once
- All tasks contribute to gradient updates during training
- The model develops representations that can generalize across tasks
- Knowledge and patterns learned from one task can directly benefit other tasks during training
In sequential fine-tuning, a model is fine-tuned on one task, then fine-tuned again on a different task:
- The model may forget knowledge from earlier tasks (catastrophic forgetting)
- Later tasks can overwrite parameters important for earlier tasks
- Task order can significantly impact final performance
- Knowledge transfer between tasks is limited by the sequential nature
Consequently, training the same model sequentially (first on NER, then on textcat) would not produce the same effect as multi-task fine-tuning.
In spaCy it should be relatively easy to set up experiments with separate and shared embedding layer to see how much benefit you get from a multi-task setup.