Memory Error

mmars9 · February 20, 2018, 2:40pm

Hi again,

We’ve recently spun up a new VM to train a spacy model on a new set of annotations collected using ner.teach and ner.manual (written to the same dataset). We’re experiencing some sort of memory error regarding storing the model as a bytes object. I’m wondering if you have any insight on if this is a config issue within spacy or something within the environment.

The output from the training is below. We’ve tried a couple times, with varying parameters (n=10 in one case). The model gets through each of the iterations (so, all the calculation is completed without issue) but errors out when writing the model to the specified output directory. Any ideas?

newvm@Spacy:~/ner/annotations/0216$ python3 -m prodigy ner.batch-train ner_set_0216 en_core_web_lg -n 20 --output /home/bdsdev/ner/models/model_3
Using 100% of remaining examples (1082) for training
Dropout: 0.2 Batch size: 32 Iterations: 20

BEFORE 0.332
Correct 476
Incorrect 957
Entities 1578
Unknown 654

LOSS RIGHT WRONG ENTS SKIP ACCURACY

01 16.588 572 525 1073 0 0.521
02 14.218 677 418 1033 0 0.618
03 13.778 698 376 1027 0 0.650
04 13.385 737 333 1062 0 0.689
05 12.826 752 316 1087 0 0.704
06 12.252 757 300 1076 0 0.716
07 11.525 756 304 1101 0 0.713
08 11.006 758 297 1152 0 0.718
09 11.047 768 285 1139 0 0.729
10 10.116 769 280 1181 0 0.733
11 9.520 777 274 1201 0 0.739
12 9.517 768 282 1263 0 0.731
13 9.180 768 273 1323 0 0.738
14 8.783 770 281 1361 0 0.733
15 8.852 755 290 1376 0 0.722
16 8.650 763 283 1298 0 0.729
17 8.373 773 276 1279 0 0.737
18 8.514 768 273 1231 0 0.738
19 8.123 763 271 1306 0 0.738
20 7.912 771 265 1302 0 0.744

Correct 771
Incorrect 265
Baseline 0.332
Accuracy 0.744
Traceback (most recent call last):
File “/usr/lib/python3.5/runpy.py”, line 184, in _run_module_as_main
“main”, mod_spec)
File “/usr/lib/python3.5/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/home/bdsdev/.local/lib/python3.5/site-packages/prodigy/main.py”, line 248, in
controller = recipe(args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 150, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “/home/bdsdev/.local/lib/python3.5/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/home/bdsdev/.local/lib/python3.5/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File “/home/bdsdev/.local/lib/python3.5/site-packages/prodigy/recipes/ner.py”, line 376, in batch_train
model.from_bytes(best_model)
File “cython_src/prodigy/models/ner.pyx”, line 393, in prodigy.models.ner.EntityRecognizer.from_bytes
File “/home/bdsdev/.local/lib/python3.5/site-packages/spacy/language.py”, line 679, in from_bytes
msg = util.from_bytes(bytes_data, deserializers, {})
File “/home/bdsdev/.local/lib/python3.5/site-packages/spacy/util.py”, line 503, in from_bytes
setter(msg[key])
File “/home/bdsdev/.local/lib/python3.5/site-packages/spacy/language.py”, line 669, in
(‘vocab’, lambda b: self.vocab.from_bytes(b)),
File “vocab.pyx”, line 423, in spacy.vocab.Vocab.from_bytes
File “/home/bdsdev/.local/lib/python3.5/site-packages/spacy/util.py”, line 503, in from_bytes
setter(msg[key])
File “vocab.pyx”, line 421, in spacy.vocab.Vocab.from_bytes.lambda4
File “vocab.pyx”, line 417, in spacy.vocab.Vocab.from_bytes.serialize_vectors
File “vectors.pyx”, line 408, in spacy.vectors.Vectors.from_bytes
File “/home/bdsdev/.local/lib/python3.5/site-packages/spacy/util.py”, line 503, in from_bytes
setter(msg[key])
File “vectors.pyx”, line 402, in spacy.vectors.Vectors.from_bytes.deserialize_weights
File “/home/bdsdev/.local/lib/python3.5/site-packages/msgpack_numpy.py”, line 187, in unpackb
return _unpacker.unpackb(packed, encoding=encoding, **kwargs)
File “/home/bdsdev/.local/lib/python3.5/site-packages/msgpack/fallback.py”, line 124, in unpackb
ret = unpacker._unpack()
File “/home/bdsdev/.local/lib/python3.5/site-packages/msgpack/fallback.py”, line 600, in _unpack
ret[key] = self._unpack(EX_CONSTRUCT)
File “/home/bdsdev/.local/lib/python3.5/site-packages/msgpack/fallback.py”, line 617, in _unpack
return bytes(obj)
MemoryError

honnibal · February 20, 2018, 8:11pm

I suspect the VM might be out of ram? nlp.to_bytes() with the lg model makes a pretty big string, because of all the word vectors. If msgpack makes a copy of the string to read it in, and the model already has the vectors loaded before it’s deserializing the model, we end up multiple copies of this data, which would eat up the memory fast.

Maybe try changing the call to model.to_bytes() to model.nlp.to_bytes(disable=['vocab'])? Then change model.from_bytes() to model.nlp.from_bytes().

You can also just make the VM bigger, of course – but obviously it’s nice to keep costs lower.

baeumer · April 13, 2018, 12:09pm

What about using a swap file? (see https://wiki.archlinux.org/index.php/swap). Using a swap of 5G fixed some MemoryErrors during setup on a small demo VM. I know it will be slower but maybe it’s a simple solution in order to save money for the beginning?

Gregory-Howard · June 15, 2018, 9:58am

with 2500 expressions on a french model we can see an increase of RAM used

Topic		Replies	Views
MemoryError when saving trained model textcat , solved	2	955	August 15, 2018
Command "ner.batch-train" returns MemoryError ner , solved	5	826	August 22, 2019
ner.batch-train memory error ner	1	990	April 13, 2018
Error on saving model from textcat.batch-train textcat , spacy	1	1406	December 29, 2017
batch train buffer full done , spacy	26	3491	June 25, 2018

Memory Error

LOSS RIGHT WRONG ENTS SKIP ACCURACY

Related topics