I'm getting an error after running textcat.batch-train
that I think may be related to the size of the starting model I'm using. I'm starting a large (3.3 GB) custom starting model that has aligned word vectors from many languages. Memory usage during training is high (11 GB+) and then I get this error at the end of training when trying to save:
...
Baseline 0.63
Precision 0.89
Recall 0.91
F-score 0.90
Accuracy 0.92
Traceback (most recent call last):
File "/Users/ahalterman/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/ahalterman/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/prodigy/__main__.py", line 242, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 150, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/prodigy/recipes/textcat.py", line 154, in batch_train
nlp = nlp.from_bytes(best_model)
File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 671, in from_bytes
msg = util.from_bytes(bytes_data, deserializers, {})
File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/spacy/util.py", line 500, in from_bytes
msg = msgpack.loads(bytes_data, encoding='utf8')
File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/msgpack_numpy.py", line 187, in unpackb
return _unpacker.unpackb(packed, encoding=encoding, **kwargs)
File "msgpack/_unpacker.pyx", line 139, in msgpack._unpacker.unpackb (msgpack/_unpacker.cpp:2068)
ValueError: 3378393043 exceeds max_bin_len(2147483647)