Hi,
As we are exploring how we can improve a current annotation, we are wondering is there a way to not import en_core_web_sm? i.e. without a pre-trained model.
Hi,
As we are exploring how we can improve a current annotation, we are wondering is there a way to not import en_core_web_sm? i.e. without a pre-trained model.
Sure, but you’ll still need to pass in a base model to start with that includes the language data, tokenization rules etc. This can be a completely blank model with no weights – but you always need to start with something.
To save out a blank model, you can run the following:
import spacy
nlp = spacy.blank("en") # or whichever language you want to use
nlp.to_disk("/path/to/model")
Or a handy one-liner on the command line:
python -c "import spacy;spacy.blank('en').to_disk('/path/to/model')"
You can then load in /path/to/model
as the base model.