Reusing Base Model Parts to Save Space Across multiple Classifiers


I was wondering whether it would be possible to get some advice on saving memory/space efficient by reusing model parts across multiple related text classifiers.

I recently trained a set of models in the same domain (trying to classify unusual employment statuses from job titles), each using en_core_web_sm. These models were intended to semantically discriminate out false positives from a sub-string extraction. I want to keep these models separate, rather than turn them into a multi-classifier, as they've all been trained on a narrow pre-filtered selection of input data points, and therefore perform badly when this is widened.

However it would be amazing if the four output models I've trained could reuse common pieces through the same spacy loader: the parser, the tagger etc, as this would reduce the memory and storage demands by ~ 4x. I was formerly advised this could be done by using the first output model as the spacy_model argument to train the subsequent models as below, but this hasn't seemed to make the model directories any smaller.

python -m prodigy train textcat grads_final en_core_web_sm --output grad_model --eval-split 0.15
python -m prodigy train textcat intern_final grad_model --output intern_model --eval-split 0.15
python -m prodigy train textcat contractor_final grad_model --output contractor_model --eval-split 0.
python -m prodigy train textcat trainee_final grad_model --output trainee_model --eval-split 0.15

Any suggestions appreciated :slight_smile:..

How large is your model and what takes up the most space? The en_core_web_sm model and its components should be very small – the only thing that usually makes a difference in terms of size are word vectors.

The approach here will use the base model and add a text classification component to it, or update the text classifier if it's already available. So I'm not 100% sure that workflow does what you want? Because you're essentially updating the same classifier multiple times with different data.

Yes, they are all fairly small, each about 18.5 Mb. For each of the models the parser, ner, and tagger combined take up ~ 11.5 Mb, and unless I'm mistaken are identical? I guess ultimately it would be nice if I was able to reuse these parts across the four models?