Can't load model if trained for NER and TEXTCAT

Hi!

I’m having an issue when attempting to load any model that’s been trained for text categorisation in addition to ner.

I’m starting by loading up loading up a customised ‘en_core_web_lg’ model as follows:

nlp = spacy.load('en_core_web_lg')
sentencizer = Sentencizer(punct_chars=['^',' ^ ','^ ','^ '])
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns([{"label": "EMAIL", "pattern": [{'LIKE_EMAIL':True}]},
		    {"label": "MOBILE", "pattern": [{"TEXT": {"REGEX": "(\+\d{1,3})?([\(\[\d\)\}]{3})?\s?\d{4,5}\s?\d{3}\s?\d{3}"}}]},
		    {"label": "ADDRESS", "pattern": [{"TEXT": {"REGEX": "\b(\d{1,3}\s)+([A-z\s\,\.]*)?[A-Z]{1,2}\d{1,3}\s?\d?[A-Z]{2}\b"}}]}])
nlp.add_pipe(sentencizer, before="parser")
nlp.add_pipe(ruler, before='ner')
nlp.begin_training()
nlp.to_disk('custom1')

I’m then loading the model into custom ner.teach and textcat.teach recipes and have no issue when training the model on either one. However, once I’ve trained the model so that it includes text categorisation, I can no longer run ner.teach (either my customised version or the built-in ner.teach recipe). It’s fine the other way - once I’ve done batch training for ner and textcat, I can successfully load the model for further textcat.teach.

My environment looks like this:

  • Python 3.6.5
  • Spacy 2.1.4
  • Prodigy 1.8.3
  • Thinc 7.0.4

The trace I get is as follows:

Traceback (most recent call last):
File "/anaconda3/lib/python3.6/pickle.py", line 269, in _getattribute
obj = getattr(obj, subpath)
AttributeError: module 'thinc.linear.linear' has no attribute 'lambda'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/anaconda3/lib/python3.6/pickle.py", line 918, in save_global
obj2, parent = _getattribute(module, name)
File "/anaconda3/lib/python3.6/pickle.py", line 272, in _getattribute
.format(name, obj))
AttributeError: Can't get attribute 'lambda' on <module 'thinc.linear.linear' from '/anaconda3/lib/python3.6/site-packages/thinc/linear/linear.cpython-36m-darwin.so'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/anaconda3/lib/python3.6/site-packages/prodigy/__main__.py", line 380, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 212, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/anaconda3/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/anaconda3/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "sentencizer2.py", line 19, in custom_ner
components = teach(dataset=dataset, spacy_model=spacy_model, source=stream, label=label, patterns=patterns)
File "/anaconda3/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 122, in teach
model = EntityRecognizer(nlp, label=label)
File "cython_src/prodigy/models/ner.pyx", line 178, in prodigy.models.ner.EntityRecognizer.__init__
File "/anaconda3/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/anaconda3/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/anaconda3/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/anaconda3/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/anaconda3/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/anaconda3/lib/python3.6/copy.py", line 215, in _deepcopy_list
append(deepcopy(a, memo))
File "/anaconda3/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/anaconda3/lib/python3.6/copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/anaconda3/lib/python3.6/copy.py", line 220, in <listcomp>
y = [deepcopy(a, memo) for a in x]
File "/anaconda3/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/anaconda3/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/anaconda3/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/anaconda3/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/anaconda3/lib/python3.6/copy.py", line 169, in deepcopy
rv = reductor(4)
File "/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 96, in __getstate__
return srsly.pickle_dumps(self.__dict__)
File "/anaconda3/lib/python3.6/site-packages/srsly/_pickle_api.py", line 14, in pickle_dumps
return cloudpickle.dumps(data, protocol=protocol)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 954, in dumps
cp.dump(obj)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 284, in dump
return Pickler.dump(self, obj)
File "/anaconda3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 805, in _batch_appends
save(x)
File "/anaconda3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
File "/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 96, in __getstate__
return srsly.pickle_dumps(self.__dict__)
File "/anaconda3/lib/python3.6/site-packages/srsly/_pickle_api.py", line 14, in pickle_dumps
return cloudpickle.dumps(data, protocol=protocol)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 954, in dumps
cp.dump(obj)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 284, in dump
return Pickler.dump(self, obj)
File "/anaconda3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 419, in save_function
self.save_function_tuple(obj)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 602, in save_function_tuple
save(state)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 805, in _batch_appends
save(x)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/anaconda3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
File "/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 96, in __getstate__
return srsly.pickle_dumps(self.__dict__)
File "/anaconda3/lib/python3.6/site-packages/srsly/_pickle_api.py", line 14, in pickle_dumps
return cloudpickle.dumps(data, protocol=protocol)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 954, in dumps
cp.dump(obj)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 284, in dump
return Pickler.dump(self, obj)
File "/anaconda3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 805, in _batch_appends
save(x)
File "/anaconda3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
File "/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 96, in __getstate__
return srsly.pickle_dumps(self.__dict__)
File "/anaconda3/lib/python3.6/site-packages/srsly/_pickle_api.py", line 14, in pickle_dumps
return cloudpickle.dumps(data, protocol=protocol)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 954, in dumps
cp.dump(obj)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 284, in dump
return Pickler.dump(self, obj)
File "/anaconda3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 419, in save_function
self.save_function_tuple(obj)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 602, in save_function_tuple
save(state)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 805, in _batch_appends
save(x)
File "/anaconda3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
File "/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 96, in __getstate__
return srsly.pickle_dumps(self.__dict__)
File "/anaconda3/lib/python3.6/site-packages/srsly/_pickle_api.py", line 14, in pickle_dumps
return cloudpickle.dumps(data, protocol=protocol)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 954, in dumps
cp.dump(obj)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 284, in dump
return Pickler.dump(self, obj)
File "/anaconda3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 805, in _batch_appends
save(x)
File "/anaconda3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
File "/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 96, in __getstate__
return srsly.pickle_dumps(self.__dict__)
File "/anaconda3/lib/python3.6/site-packages/srsly/_pickle_api.py", line 14, in pickle_dumps
return cloudpickle.dumps(data, protocol=protocol)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 954, in dumps
cp.dump(obj)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 284, in dump
return Pickler.dump(self, obj)
File "/anaconda3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/anaconda3/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda3/lib/python3.6/pickle.py", line 507, in save
self.save_global(obj, rv)
File "/anaconda3/lib/python3.6/site-packages/srsly/cloudpickle/cloudpickle.py", line 704, in save_global
return Pickler.save_global(self, obj, name=name)
File "/anaconda3/lib/python3.6/pickle.py", line 922, in save_global
(obj, module_name, name))

_pickle.PicklingError: Can't pickle <cyfunction LinearModel.<lambda> at 0x10b199d38>: it's not found as thinc.linear.linear.lambda

Not sure how to solve this! Is there an issue with how I’m initially creating the model which is causing the problem?

Many thanks in advance

Hmm, which version of Prodigy and spaCy are you using? I thought we took care of that pickling error in v2.1, but it’s possible we didn’t.

Hi Matt - thanks very much for coming back to me.

I’m currently on the following versions:

  • Python 3.6.5
  • spaCy 2.1.4
  • Prodigy 1.8.3
  • Thinc 7.0.4

Okay so, the issue is that inside ner.teach we make a call to copy.deepcopy, which frustratingly seems to run a slightly different code-path than Pickle. So even though we resolved that error for Pickle, deepcopy is still a problem. Hmm.

For now, you can work around the problem by adding disabled = nlp.disable_pipes("textcat") near the top of your ner.teach recipe. The ner.teach recipe doesn’t save out the model (it just makes annotations), so you shouldn’t have to worry about re-enabling it.

By the way, for the same reason, you might not need to run ner.teach with a text categorizer in the pipeline — the text categorizer won’t be interacting with the NER at all, unless you’ve been doing something custom.

Another thing worth pointing out is, your call to nlp.begin_training() might be incorrect in your above snippet. That will zero the weights for the model you just loaded, which likely isn’t what you want?