MemoryError when saving trained model

Hey,

I am using a VM currently for Prodigy and performing text classification. When I use the following command-

prodigy textcat.batch_train review_class en_core_web_lg --output /result --eval-split 0.2

It trains the model and even evaluates it but then I get an error and can’t find the output file in the directory. Following is the output of the command above-

/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192, got 176
  return f(*args, **kwds)
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192, got 176
  return f(*args, **kwds)

Loaded model en_core_web_lg
Using 20% of examples (33) for evaluation
Using 100% of remaining examples (134) for training
Dropout: 0.2  Batch size: 10  Iterations: 10  

#          LOSS       F-SCORE    ACCURACY  
01         4.435      0.815      0.844                                          
02         3.034      0.846      0.875                                          
03         2.621      0.846      0.875                                          
04         2.690      0.828      0.844                                          
05         3.863      0.800      0.812                                          
06         2.543      0.828      0.844                                          
07         2.598      0.815      0.844                                          
08         2.470      0.815      0.844                                          
09         3.204      0.815      0.844                                          
10         2.128      0.846      0.875                                          

MODEL      USER       COUNT     
accept     accept     11        
accept     reject     1         
reject     reject     17        
reject     accept     3         

         
Correct    28
Incorrect  4

         
Baseline   0.56      
Precision  0.92      
Recall     0.79      
F-score    0.85      
Accuracy   0.87
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/prodigy/__main__.py", line 259, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 167, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/prodigy/recipes/textcat.py", line 154, in batch_train
    nlp = nlp.from_bytes(best_model)
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/spacy/language.py", line 696, in from_bytes
    msg = util.from_bytes(bytes_data, deserializers, {})
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/spacy/util.py", line 493, in from_bytes
    setter(msg[key])
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/spacy/language.py", line 687, in <lambda>
    self.vocab.from_bytes(b) and _fix_pretrained_vectors_name(self))),
  File "vocab.pyx", line 421, in spacy.vocab.Vocab.from_bytes
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/spacy/util.py", line 493, in from_bytes
    setter(msg[key])
  File "vocab.pyx", line 419, in spacy.vocab.Vocab.from_bytes.lambda4
  File "vocab.pyx", line 415, in spacy.vocab.Vocab.from_bytes.serialize_vectors
  File "vectors.pyx", line 428, in spacy.vectors.Vectors.from_bytes
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/spacy/util.py", line 493, in from_bytes
    setter(msg[key])
  File "vectors.pyx", line 422, in spacy.vectors.Vectors.from_bytes.deserialize_weights
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/msgpack_numpy.py", line 187, in unpackb
    return _unpacker.unpackb(packed, encoding=encoding, **kwargs)
  File "msgpack/_unpacker.pyx", line 200, in msgpack._unpacker.unpackb
  File "/home/prodigy/pgy-env/lib/python3.6/site-packages/msgpack_numpy.py", line 84, in decode
    dtype=np.dtype(descr)).reshape(obj[b'shape'])
 
MemoryError

So how do I solve this error and extract the output file?

Thanks

How much memory does your VM have? Since you’re using the large en_core_web_lg model, it’s possible that you’re running out of memory when the final model is serialized and saved out to disk.

Yes, I resolved this issue with using the ‘sm’ spacy model and could extract the output file.