packaging new models


I’ve been training a TextCategorizer and added it to the pipeline of an existing spacy_core_web_lg model. Because I want to use this model in prodigy, i decided to package it according to the following method:

  1. use the nlp.to_disc() method to store the model.
  2. using the command line option python -m spacy package to create a package
  3. subsequently creating a package with python sdist
  4. Finally, installing the created tar.gz package with pip install (packagename)

This all works perfectly, as long as I do not change the name of the model in meta.json. When I do that, I get the following error message when loading the model into python:

Can’t find model ‘en_core_web_lg.vectors’. It doesn’t seem to be a shortcut link, a Python package or a valid path to a data directory.

This can be solved by loading en_core_web_lg separately, but that does not allow me to load the model into prodigy. Even manually setting the name of the vectors in meta.json does not work for me. Could you advice on how to solve above problem and rename the model name? As a quick fix, i will simply overwrite the existing en_core_web_lg model, though that will be inconvenient in the long term.

Many thanks for this great application!

Hi Jurre,

Thanks for the detailed report. I think this bug might have been fixed in v2.1.0a0, if you want to try that version of spaCy. It’s an alpha, and you’ll have to download a new model — but hopefully it’ll take care of this problem.

The problem occurs because we were loading in the vectors with a name in a global variable, which fails if multiple models are loaded. When we fixed this bug, we wanted to make sure we didn’t break compatibility with existing models, but the logic for this patch is a bit brittle. So I think that’s why it’s broken.

You might want to check the cfg files in the subdirectories. It’s possible values there are the ones which are incorrect, leading to the incorrect data being written out. Alternatively, overwriting before saving might fix it too.