packaging new models

JurreCorver · July 25, 2018, 3:23pm

Hello,

I’ve been training a TextCategorizer and added it to the pipeline of an existing spacy_core_web_lg model. Because I want to use this model in prodigy, i decided to package it according to the following method:

use the nlp.to_disc() method to store the model.
using the command line option python -m spacy package to create a package
subsequently creating a package with python setup.py sdist
Finally, installing the created tar.gz package with pip install (packagename)

This all works perfectly, as long as I do not change the name of the model in meta.json. When I do that, I get the following error message when loading the model into python:

“”"
Can’t find model ‘en_core_web_lg.vectors’. It doesn’t seem to be a shortcut link, a Python package or a valid path to a data directory.
“”"

This can be solved by loading en_core_web_lg separately, but that does not allow me to load the model into prodigy. Even manually setting the name of the vectors in meta.json does not work for me. Could you advice on how to solve above problem and rename the model name? As a quick fix, i will simply overwrite the existing en_core_web_lg model, though that will be inconvenient in the long term.

Many thanks for this great application!

honnibal · July 30, 2018, 11:30am

Hi Jurre,

Thanks for the detailed report. I think this bug might have been fixed in v2.1.0a0, if you want to try that version of spaCy. It’s an alpha, and you’ll have to download a new model — but hopefully it’ll take care of this problem.

The problem occurs because we were loading in the vectors with a name in a global variable, which fails if multiple models are loaded. When we fixed this bug, we wanted to make sure we didn’t break compatibility with existing models, but the logic for this patch is a bit brittle. So I think that’s why it’s broken.

You might want to check the cfg files in the subdirectories. It’s possible values there are the ones which are incorrect, leading to the incorrect data being written out. Alternatively, overwriting nlp.vocab.vectors.name before saving might fix it too.

Topic		Replies	Views
Issue with exported spacy models spacy , solved	13	6384	May 21, 2021
rename a model ner , spacy , training	6	526	February 2, 2022
Unnamed vectors -- this won't allow multiple vectors models to be loaded spacy	4	4564	July 18, 2018
no hyphens in model names? spacy	2	601	October 16, 2018
spacy model loading regression done , spacy	5	1697	April 10, 2018

packaging new models

Related topics