It would be nice to have more control over optimization with batch-train, especially the learning rate. Right now, If I run batch-train for 10 epochs and see that the model is still improving I can’t reload the saved model and continue training (efficiently) because the learning rates default is too large. I have no choice but to rerun training for more epochs.
Another nice-to-have would be early-stopping, where the model keeps training until there is no improvement over --wait epochs.
I agree that these things are nice. We used to write out the model after each epoch, but if the pretrained vectors are large this gets annoying.
I’m reluctant to make the command too complicated though. I suggest customising the recipe will work better for you.
You can set the learning rate by writing to the optimizer.alpha attribute within the recipe (i know, this should be named better…). Advice about spaCy’s hyper-parameters can be found here: https://spacy.io/usage/training#tips
Note that there are several settings that interact. In particular, the parameter averaging means later iterations have less impact on the model, which is a bit like annealing the learning rate. The adam solver, gradient clipping and batch size all interact too. I usually find the model isnt that sensitive to the learning rate if you dont change other settings. I actually dont usuaslly modify the LR, but maybe i should.