Multi-Task Training with ProdigyHF

samirmsallem · March 20, 2025, 9:01pm

Hi there,

I am quite new to using Prodigy. I have Recently came across custom recipes and created a recipe to annotate a dataset with NER labels and text classification at the same time.

My goal is to fine tune a hugging face model with the dataset for multi task (ner + textcat) because these two are very related in the context.

First question: afaik there is no possibility to do this with ProdigyHF, correct? I only see recipes for NER and textcat but not both.

Second: Is it possible to export the dataset in a scheme that I can perform a multi task training outside of Prodigy ex. with the HuggingFace trainer? I used ner.manual and textcat.manual for annotating.

Third: If no, would a training of the same model first on NER labels of the dataset and then on the textcat labels would bring the same effect like multi task fine tuning?

Thank you in advance

magdaaniol · March 24, 2025, 12:07pm

Welcome to the forum @samirmsallem

First question: afaik there is no possibility to do this with ProdigyHF, correct? I only see recipes for NER and textcat but not both.

You're correct that there's no built-in recipe for multi-task (NER + textcat) training in ProdigyHF. The ProdigyHF recipes are built around the Hugging Face transformers library's task-specific paradigm, where models are specialized for particular tasks:

hf.train.ner uses AutoModelForTokenClassification
hf.train.textcat uses AutoModelForSequenceClassification

These are different model architectures with task-specific prediction heads. The recipes are designed to be straightforward, single-purpose training scripts that align with Hugging Face's task-specific model classes.
transformers does not offer a built-in architecture for multi-task learning as this requires a custom architecture with multiple task heads.

I would recommend you try with spacy-transformers which uses multi-task learning by default (see docs on shared embedding layer).
Both NER and textcat components can be configured to use (and update) the same transformer embeddings via the TransformerListener architecture, effectively implementing multi-task learning.

It's also very straightforward to export the data annotated with Prodigy to train a multi-component spaCy pipeline. data-to-spacy command will take care of merging the textcat and NER annotations to create a single spaCy training example. spaCy will also take care of transformer-specific tokenization.

Second: Is it possible to export the dataset in a scheme that I can perform a multi task training outside of Prodigy ex. with the HuggingFace trainer? I used ner.manual and textcat.manual for annotating.

If however, you'd like train directly with transformers, you can export the data with db-out command to get in a straightforward JSONL format and convert it into any format you need using a custom script. You can also check the source code of ProdigyHF recipes to see how the conversion is done there.

Third: If no, would a training of the same model first on NER labels of the dataset and then on the textcat labels would bring the same effect like multi task fine tuning?

Multi-task fine-tuning differs from sequential single-task fine-tuning in several important ways:

In multi-task fine-tuning, a model is trained simultaneously on multiple tasks, with a shared parameter space. This means:

The model learns to optimize for all tasks at once
All tasks contribute to gradient updates during training
The model develops representations that can generalize across tasks
Knowledge and patterns learned from one task can directly benefit other tasks during training

In sequential fine-tuning, a model is fine-tuned on one task, then fine-tuned again on a different task:

The model may forget knowledge from earlier tasks (catastrophic forgetting)
Later tasks can overwrite parameters important for earlier tasks
Task order can significantly impact final performance
Knowledge transfer between tasks is limited by the sequential nature

Consequently, training the same model sequentially (first on NER, then on textcat) would not produce the same effect as multi-task fine-tuning.
In spaCy it should be relatively easy to set up experiments with separate and shared embedding layer to see how much benefit you get from a multi-task setup.

Topic		Replies	Views
Combining NER with text classification usage , ner , textcat	10	6898	March 20, 2024
Using transformer models inside prodigy and finetuning enhancement , usage , transformers	10	3614	May 1, 2020
Best practices & realistic expectations with high number of classes for multiclass text classification task usage , textcat , spacy	2	1146	August 27, 2019
Is it possible to do NER and Textcat Annotation together? ner , textcat	4	38	October 28, 2024
Textcat model with multiple classes usage , textcat	5	1541	November 1, 2019

Multi-Task Training with ProdigyHF

Related topics