I have observed that prodigy is a really good tool to start with building awesome baselines for text classification/ner tasks. However, when it comes to production level specific tasks where accuracy and f1-score both are crucial, transformer based models do a better job than prodigy.
However whereas spacy models (when trained with
blank:en) are as small as 8-10 MB, the transformer architecture models are as high as 700-800 MB. For a task of text classification which I am running, spacy gives an f1-score of 83% whereas distilBERT is giving me an f1-score of 94%
How can I obtain a better tradeoff between the model size and model performance. Is there anything which prodigy or spacy offers that could help get me a good accuracy without having to increase my model size too much?
Thanks & Regards,
PS: I tried hyperparameter tuning with existing spacy/prodigy models including dropout, learning rate, train-valid split ratio etc. but that didn't yield any substantially better results...