Model Training for NER


I would like to train a new entity label that has no relation to the existing labels that have been pre-trained in spaCy’s model. In that light, I thought training a model from scratch (blank ner) would be a great idea to start.

I have a few questions regarding training a blank model.

  1. I have a blank ner model on hand (created from en_core_web_lg). When using any recipe to train my label (for eg ner.match, ner.teach), do I load in this blank model as the argument in the recipe?

  2. When I perform ner.match to annotate via pattern file, I understand that no active learning takes place. So can I assume that, the annotations only gets learnt when I perform ner.batch-train?

  3. When running the batch training, the argument for spaCy model, do I specify the blank ner model?

  4. After I get the trained model and saved it to a directory, whenever I like to train my model again on more data, or continue annotation from where I left off, do I just load the trained model? Is there any other arguments to put, because I notice there is a --resume functionality in ner.teach…

  5. When training model for next time via batch-train, do I train on a brand new blank model or re-train on existing model?


Yes! :slightly_smiling_face:

Exactly! Also, keep in mind that recipes that do support active learning also don't just modify the model in place. (That would be bad – the model should always be properly batch trained afterwards.) The active learning only helps create better annotations.

Yes, the model passed into the training command is the model you want to update.

If you want to annotate with a model in the loop, you ideally want to be using the model you previously updated, yes. For recipes that don't use any active learning it usually doesn't matter – they only really use the model for tokenization etc., so you could theoretically also pass in the untrained base model.

We usually recommend starting with the blank base model every time and then updating it with all of the annotations. This may help prevent different interactions between the existing weights and new examples, and also makes it easier to compare your results, because they were produced by the same process, just with different amounts of data.