spacy model loading regression

Hi folks,
I updated from prodigy 1.4.0 prodigy 1.4.1 and I started getting this error whenever I tried to load a my spacy model with custom word vectors:

root@e695e0fd0db6:/# prodigy terms.teach terms custom_model -se some,words,here
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/site-packages/prodigy/__main__.py", line 254, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 152, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/gensim/prodigy_terms.py", line 43, in train_vectors
    nlp = spacy.load(spacy_model)
  File "/usr/local/lib/python3.6/site-packages/spacy/__init__.py", line 15, in load
    return util.load_model(name, **overrides)
  File "/usr/local/lib/python3.6/site-packages/spacy/util.py", line 116, in load_model
    return load_model_from_path(Path(name), **overrides)
  File "/usr/local/lib/python3.6/site-packages/spacy/util.py", line 156, in load_model_from_path
    return nlp.from_disk(model_path)
  File "/usr/local/lib/python3.6/site-packages/spacy/language.py", line 653, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/usr/local/lib/python3.6/site-packages/spacy/util.py", line 511, in from_disk
    reader(path / key)
  File "/usr/local/lib/python3.6/site-packages/spacy/language.py", line 649, in <lambda>
    deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
  File "pipeline.pyx", line 643, in spacy.pipeline.Tagger.from_disk
  File "/usr/local/lib/python3.6/site-packages/spacy/util.py", line 511, in from_disk
    reader(path / key)
  File "pipeline.pyx", line 626, in spacy.pipeline.Tagger.from_disk.load_model
  File "pipeline.pyx", line 627, in spacy.pipeline.Tagger.from_disk.load_model
  File "/usr/local/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 351, in from_bytes
    dest = getattr(layer, name)
AttributeError: 'FunctionLayer' object has no attribute 'vectors'

I tried rebuilding my environment with 1.4.0 but I’m seeing the same regression. I’m guessing another library changed. I had an old copy of my environment where this error is not occurring and here is the diff in the output of pip freeze:

diff pip.old pip.new
13d12
< ftfy==4.4.3
15d13
< html5lib==1.0b8
39c37
< spacy==2.0.10
---
> spacy==2.0.11
43c41
< tqdm==4.19.8
---
> tqdm==4.21.0
47d44
< wcwidth==0.1.7

Based on this I’m guessing this is a regression with spacy==2.0.11? I’ll try to see if I can downgrade to spacy==2.0.10 to address the issue…

UPDATE: I’ve verified that downgrading spacy to 2.0.10 addresses this regression.

Interestingly it looks like a very similar issue was fixed in an older version of spacy (v2.0.6) but is now back again? https://github.com/explosion/spaCy/issues/1727

@beckerfuffle Hmm! Sorry about this. This relates to a recent fix in v2.0.11: https://github.com/explosion/spaCy/issues/1660

There’s currently a deprecation fix in place that’s supposed to allow the current code to work with the existing models, which don’t declare the vectors properly in the cfg files. I think this deprecation fix is what’s causing the current error.

Would you be able to paste the contents of the tagger/cfg and meta.json files in your model? It’s difficult to figure out exactly where this is going wrong without seeing which code-path is being followed, and I’m not sure exactly how the model was built.

Just a reminder that I built this model using the core_web_sm model as a base Loading gensim word2vec vectors for terms.teach? :

tagger/cfg

{
  "cnn_maxout_pieces":2,
  "pretrained_dims":0
}

meta.json:

{
  "lang":"en",
  "pipeline":[
    "tagger",
    "parser",
    "ner"
  ],
  "accuracy":{
    "token_acc":99.8698372794,
    "ents_p":84.9664503965,
    "ents_r":85.6312524451,
    "uas":91.7237657538,
    "tags_acc":97.0403350292,
    "ents_f":85.2975560875,
    "las":89.800872413
  },
  "name":"core_web_sm",
  "license":"CC BY-SA 3.0",
  "author":"Explosion AI",
  "url":"https://explosion.ai",
  "vectors":{
    "width":100,
    "vectors":1081412,
    "keys":1081412
  },
  "sources":[
    "OntoNotes 5",
    "Common Crawl"
  ],
  "version":"2.0.0",
  "spacy_version":">=2.0.0a18",
  "parent_package":"spacy",
  "speed":{
    "gpu":null,
    "nwords":291344,
    "cpu":5122.3040471407
  },
  "email":"contact@explosion.ai",
  "description":"English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities."
}

Thanks! I think I see the problem. I bet I’m checking whether the pretrained_dims key exists somewhere, instead of checking the value isn’t 0.

1 Like

@beckerfuffle Just upload a new dev build. Would you mind doing pip install spacy==2.0.12.dev0 and checking it resolves the problem?

1 Like

Yup that fixed it! I’ll use 2.0.12.dev0 for now!