Training/Evaluation dilemma

honnibal · July 17, 2019, 10:05am

If you have all the training data available, you usually want to start from a blank (or vectors) model, rather than using one you’ve previously trained, and training on top of it.

The mechanics of whether the data will be “double counted” are a little subtle, though. In theory you’ll probably converge to a similar solution, even if you do start with the previous model. To understand why this is the case, remember that the model you load in, that you previously trained on your first 1000 texts, is going to have pretty low training loss on those texts. If it’s getting those examples mostly right, then it won’t update much against them. The the total magnitude of the updates you’re making will be dominated by the new texts initially.

That said: it’s simply much more difficult to reason about the performance and training dynamics if you start from an intermediate state, rather than starting from the random initialisation. You also don’t save much time doing that, so there’s little advantage. The only time you want to update on top of existing weights is if you don’t have access to the initial training data (e.g. with the en_core_web_lg etc spaCy models, which are trained on proprietary datasets we can’t distribute), or when the initial training took a very long time (e.g. with language model pretraining).

The other consideration is simple repeatability. If you’re always training on top of a prior model, then it’ll be really hard to start from scratch and reproduce the same result. You would have to first train from 1000, save that out, and then train from the full 1500. Being able to repeat your work is obviously good for sanity.

The other time you need to use a non-blank model is in commands like ner.teach and ner.make-gold. But notice that these recipes are designed to produce annotations, they’re not designed to produce models. It makes sense to start from a non-blank model if the goal is to do model-assisted annotation. But if you just want to output a model, you want to start from an initial condition that’s easy to reason about, so you don’t want to resume from an arbitrary model.

For future readers, @ines’s reply on a related thread might also be worth reading: Model Training for NER

Topic		Replies	Views
Re-annotating records usage , database , streams	4	567	May 5, 2020
Problem in training the model usage , ner	10	598	May 26, 2020
Debugging NER - batch_train with custom dataset ner	5	589	October 16, 2019
Best strategy for training an NER engine usage , ner	8	2177	December 27, 2017
Which number of training labels should I trust	1	364	November 10, 2022

Training/Evaluation dilemma

Related topics