Using previous model-last as base model in prodigy train

joebuckle · January 3, 2023, 1:48am

In our previous iterations, we were always running "prodigy train" without using a base model for text classification (multi-label), and we got to as high as .82 as the score. When I opted to use this previously trained model-last as the --base-model for our latest iteration, we got to .90+. Is this a good idea? Or is this overfitting? We are using spacy.TextCatEnsemble.v2 as the architecture.

Thanks.

ryanwesslen · January 3, 2023, 7:44pm

hi @joebuckle!

Great question! In general, we recommend against this:

Yes - it's typically better to train a model from scratch, using the same full corpus (instead of updating the same artifact over and over again, which often makes it much harder to avoid overfitting and forgetting effects).

The one thing you may want to do (if you're not already) is to modify your base-model with different vectors (e.g., en_core_web_md or en_core_web_lg). There's a bit about it in the docs:

Using pretrained word embeddings to initialize your model is easy and can make a big difference. If you’re using spaCy, try using the en_core_web_lg model as the base model. If you’re working with domain-specific texts, you can train your own vectors and create a base model with them.

joebuckle · January 4, 2023, 6:04am

Thank you for informing me that using the previously trained model as the base model is not a good idea.

I tried to use en_core_web_md and en_core_web_lg as base model, but it doesn't seem to improve the textcat_multilabel as compared to without using a base model.

Topic		Replies	Views
Should I be using --base-model when training my model? ner , training	8	2047	May 27, 2022
Tune existing Spacy NER model usage , ner	5	308	April 16, 2022
Pre-trained model vs training a model from scratch? ner , best-practices	3	2810	June 27, 2018
`prodigy train` doesn't seem to use the tokenizer from base-model training	2	307	May 1, 2023
Help with training from scratch english NER model with pretrained Gensim vectors usage , ner , spacy	2	643	January 27, 2022

Using previous model-last as base model in prodigy train

Related topics