Load error after adding custom textcat model to the pipeline

Sorry if this is more a spacy question than prodigy..
i am trying to add a custom textcat model to the pipeline and save / load the resulting pipeline.

Using the spacy example/train_textcat.py and the original textcat model as a basis, i changed lines 38+ in train_textcat.py from

    textcat = nlp.create_pipe('textcat')
    nlp.add_pipe(textcat, last=True)

to

    textcat = MyTextCategorizer(nlp.vocab)
    nlp.add_pipe(textcat, last=True)

with MyTextCategorizer for now being:

class MyTextCategorizer(TextCategorizer):
    def __init__(self, vocab, model=True, **cfg):
        super(MyTextCategorizer, self).__init__(vocab, model=model, **cfg)

    @classmethod
    def Model(cls, nr_class=1, width=64, **cfg):
        pretrained_dims = cfg.get('pretrained_dims', 0)
        print pretrained_dims
        with Model.define_operators({'>>': chain, '+': add, '|': concatenate,
                                     '**': clone}):
            model = (
                SpacyVectors
                >> flatten_add_lengths
                >> with_getitem(0, Affine(width, pretrained_dims))
                >> ParametricAttention(width)
                >> Pooling(sum_pool)
                >> Residual(ReLu(width, width)) ** 2
                >> zero_init(Affine(nr_class, width, drop_factor=0.0))
                >> logistic
            )
            return model

I run the training with the en_core_web_lg as a base model.

The code classifies an example sentence with the newly trained model all right.

However, after saving the pipeline and loading it in again, i get this error:

Saved model to model
Loading from model
Traceback (most recent call last):
File "spacy_textcat/train.py", line 213, in
plac.call(main)
File "/usr/local/lib/python2.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python2.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "spacy_textcat/train.py", line 127, in main
nlp2 = spacy.load(output_dir)
File "/usr/local/lib/python2.7/site-packages/spacy/init.py", line 19, in load
return util.load_model(name, **overrides)
File "/usr/local/lib/python2.7/site-packages/spacy/util.py", line 117, in load_model
return load_model_from_path(name, **overrides)
File "/usr/local/lib/python2.7/site-packages/spacy/util.py", line 157, in load_model_from_path
return nlp.from_disk(model_path)
File "/usr/local/lib/python2.7/site-packages/spacy/language.py", line 629, in from_disk
util.from_disk(path, deserializers, exclude)
File "/usr/local/lib/python2.7/site-packages/spacy/util.py", line 520, in from_disk
reader(path / key)
File "/usr/local/lib/python2.7/site-packages/spacy/language.py", line 625, in
deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
File "pipeline.pyx", line 211, in spacy.pipeline.Pipe.from_disk
File "/usr/local/lib/python2.7/site-packages/spacy/util.py", line 520, in from_disk
reader(path / key)
File "pipeline.pyx", line 204, in spacy.pipeline.Pipe.from_disk.load_model
File "/usr/local/lib/python2.7/site-packages/thinc/neural/_classes/model.py", line 351, in from_bytes
dest = getattr(layer, name)
AttributeError: 'FeedForward' object has no attribute 'Q'

Am i doing something wrong ?

No problem --- we want using Prodigy to be a smooth experience, which is one of the reasons we've built it off our own libraries as much as possible. Problems that go back to spaCy are definitely "in scope" for the forum.

This general category of error means that the model is trying to load weights that don't match the architecture that's loading it. We're trying to find ways to make the error better.

In this case I think the solution is pretty simple. When you're loading back the model, it looks like spaCy is creating a default text classification model, and then trying to load weights for your custom architecture.

I think you should set a different name class attribute on your class, like this:

class MyTextCategorizer(TextCategorizer):
    name = 'my-textcat'

Next you need to register a "factory" for your class by adding an entry to Language.factories:

Language.factories['my-textcat'] = lambda nlp, **cfg: MyTextCategorizer(nlp.vocab, **cfg)

When you call spacy.load(), it looks up the component names in the meta.json of the model file. It then selects a factory for those components, by looking up the string in Language.factories. Your custom component was named textcat, so the wrong factory was being selected.

I am facing a similar issue while loading the custom Text classification model. Could you please tell me where would i find these files where the necessary changes are to be made . Are these Spacy files?

@aman7ronaldo What’s the exact error you’re seeing? Considering that this thread is very old, it might actually be a different issue.

This is the error which i am getting when i am loading the model .

‘FeedForward’ object has no attribute ‘W’.

This thread was talking about the similar issue. How to resolve this?

1 Like

I have the same issue - i just retrained my textcat models with the new Prodigy 1.8.3, and when I try to load them as before in SpaCy to use, I get this error re FeedForward attribute W.

@arnicas Which spaCy version are you loading the model with?

Bingo, sorry, that repo didn’t have the updated SpaCy. One other oddity that I noticed is that the doc category label was converted to UPPERCASE even though I hadn’t uppercased in my dataset labels. That was easily findable, though!