Custom vectors loading issue

kbarresi · January 16, 2020, 5:46pm

Hi,

I'm running into an issue when I try to load custom FastText vectors into my model. These are the steps:

Load the base en_core_web_sm model.
Use spacy.cli.init_model.add_vectors to load FastText vectors (stored as .vec.gz) to the model.
Disable all pipelines except NER (tagger & parser in particular).
Train NER.
End training, and restore pipelines.
Get a final evaluation score with nlp.evaluate().
Save to disc with nlp.to_disk

The all works fine. However, later when I try to re-load the model from disk, I get this error:

Traceback (most recent call last):
  File "nn_parser.pyx", line 671, in spacy.syntax.nn_parser.Parser.from_disk
  File "/opt/venv/lib/python3.7/site-packages/thinc/neural/_classes/model.py", line 375, in from_bytes
    dest = getattr(layer, name)
AttributeError: 'FunctionLayer' object has no attribute 'vectors'

...

  File "/opt/venv/lib/python3.7/site-packages/spacy/language.py", line 936, in <lambda>
    p, exclude=["vocab"]
  File "nn_parser.pyx", line 673, in spacy.syntax.nn_parser.Parser.from_disk
ValueError: [E149] Error deserializing model. Check that the config used to create the component matches the model being loaded.

I'm guessing that loading in the parser/tagger pipelines is causing this because somehow it's expecting vectors to exist where they don't. My meta.json file includes the following vector info:

  "vectors": {
    "width": 300,
    "vectors": 766082,
    "keys": 766082,
    "name": "en_model.vectors"
  },

While my parser cfg has the following:

{
  "beam_width":1,
  "beam_density":0.0,
  "beam_update_prob":1.0,
  "cnn_maxout_pieces":3,
  "nr_feature_tokens":8,
  "deprecation_fixes":{
    "vectors_name":null
  },
  "learn_tokens":false,
  "nr_class":107,
  "hidden_depth":1,
  "token_vector_width":96,
  "hidden_width":64,
  "maxout_pieces":2,
  "pretrained_vectors":null,
  "bilstm_depth":0,
  "self_attn_depth":0,
  "conv_depth":4,
  "conv_window":4,
  "embed_size":2000
}

I'm using version 2.2.3. The whole process (load, train, save, load) does work if I do not add any vectors, so it's not a version mismatch issue.

Any idea what's causing this?

honnibal · January 22, 2020, 2:59pm

This is an area of spaCy we're eager to improve (and we have something we're very keen to launch soon!). The general problem is that the system of passing config through the different components is very brittle. Defaults can be inserted at various points along the path, and this leads to lots of bugs.

The specific type of bug here is that the parser has ended up expecting vectors, I guess because there are vectors loaded onto the NLP object. There's no conceptual reason why you shouldn't have vectors in the NER and no vectors in the parser --- it's just that the config setting is being passed incorrectly.

It looks to me like the culprit is the _fix_pretrained_vectors_name function in spacy.language. This function was added to correct a previous error with the vector naming without forcing model redownloads. I think it's now causing the problem.

You might be able to simply monkey-patch the function out, like this:

import spacy.language
# Undo this backwards compatibility hack, as it interferes with having
# some components use vectors but not others.
spacy.language._fix_pretrained_vectors_name = lambda nlp: nlp

kbarresi · January 22, 2020, 6:24pm

That took care of it, thank you. Excited to see what's being launched soon!

Topic		Replies	Views
Loading fasttext vectors to spacy/prodigy ner , spacy , solved	9	1533	February 13, 2022
Issue with exported spacy models spacy , solved	13	6336	May 21, 2021
Custom loaders usage	6	1959	August 16, 2024
Error while loading the custom Text classification model in python textcat , spacy	1	811	June 20, 2019
Problems when saving model with blank NER spacy , solved	6	1591	July 24, 2018

Custom vectors loading issue

Related topics