No pre-trained model to import when ner.batch-train


As we are exploring how we can improve a current annotation, we are wondering is there a way to not import en_core_web_sm? i.e. without a pre-trained model.

Sure, but you’ll still need to pass in a base model to start with that includes the language data, tokenization rules etc. This can be a completely blank model with no weights – but you always need to start with something.

To save out a blank model, you can run the following:

import spacy
nlp = spacy.blank("en")  # or whichever language you want to use

Or a handy one-liner on the command line:

python -c "import spacy;spacy.blank('en').to_disk('/path/to/model')"

You can then load in /path/to/model as the base model.