MemoryError when doing terms.train-vec

Hi, I encounter MemoryError when I try to do terms.train-vec with en_core_web_lg and medium size corpus (14g). The command I use is as follow:

python -m prodigy terms.train-vectors ./ekp.spacy "./journal" --loader txt --spacy-model en_core_web_lg --size 300 --n-workers 20 --merge-ents --merge-nps

This is the error msg (I lost the upper part of it sorry)

  File "pipeline.pyx", line 428, in pipe
  File "pipeline.pyx", line 433, in spacy.pipeline.Tagger.predict
  File "/home/astray0924/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 161, in __call__
    return self.predict(x)
  File "/home/astray0924/anaconda3/lib/python3.6/site-packages/thinc/api.py", line 55, in predict
    X = layer(X)
  File "/home/astray0924/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 161, in __call__
    return self.predict(x)
  File "/home/astray0924/anaconda3/lib/python3.6/site-packages/thinc/api.py", line 293, in predict
    X = layer(layer.ops.flatten(seqs_in, pad=pad))
  File "/home/astray0924/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 161, in __call__
    return self.predict(x)
  File "/home/astray0924/anaconda3/lib/python3.6/site-packages/thinc/api.py", line 55, in predict
    X = layer(X)
  File "/home/astray0924/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 161, in __call__
    return self.predict(x)
  File "/home/astray0924/anaconda3/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 125, in predict
    y, _ = self.begin_update(X)
  File "/home/astray0924/anaconda3/lib/python3.6/site-packages/thinc/api.py", line 375, in uniqued_fwd
    Y = Y_uniq[inv].reshape((X.shape[0],) + Y_uniq.shape[1:])
MemoryError

The worker is 32 cores, 205g memory so it’s quite decent. I don’t think the corpus and the task is too much to this worker.

https://support.prodi.gy/t/terms-trains-crashing/366/37

Hmm… I’m not sure what the conclusion is. So the operation I’m trying to do is not possible with my machine? and more memory will make this work?

It appears to be there is a memory leak and so I found that adding memory does not solve the problem !

1 Like

So we have to wait till the problem is fixed right? I see… Thanks for the answer anyway :slight_smile: