Hi,
I am running prodigy train and encounter the out of memory issue. I tried to decrease the batch size in the config.cfg but seems it still generated default config?
-> % prodigy train prodigy/model/model_500_fulltext_reviewed2 --ner train_set_full_text500_reviewed2 --base-model en_core_web_trf --eval-split 0.2 --gpu-id 1 --config prodigy/model/config.cfg
2021-12-02 22:18:58.335203: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
ℹ Using GPU: 1
========================= Generating Prodigy config =========================
/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/util.py:833: UserWarning: [W095] Model 'en_core_web_trf' (3.1.0) was trained with spaCy v3.1 and may not be 100% compatible with the current version (3.2.0). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy_transformers/pipeline_component.py:406: UserWarning: Automatically converting a transformer component from spacy-transformers v1.0 to v1.1+. If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spacy-transformers version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
ℹ Using config from base model
✔ Generated training config
=========================== Initializing pipeline ===========================
/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/util.py:833: UserWarning: [W095] Model 'en_core_web_trf' (3.1.0) was trained with spaCy v3.1 and may not be 100% compatible with the current version (3.2.0). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/util.py:833: UserWarning: [W095] Model 'en_core_web_trf' (3.1.0) was trained with spaCy v3.1 and may not be 100% compatible with the current version (3.2.0). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy_transformers/pipeline_component.py:406: UserWarning: Automatically converting a transformer component from spacy-transformers v1.0 to v1.1+. If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spacy-transformers version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/util.py:833: UserWarning: [W095] Model 'en_core_web_trf' (3.1.0) was trained with spaCy v3.1 and may not be 100% compatible with the current version (3.2.0). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy_transformers/pipeline_component.py:406: UserWarning: Automatically converting a transformer component from spacy-transformers v1.0 to v1.1+. If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spacy-transformers version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
[2021-12-02 22:19:11,276] [INFO] Set up nlp object from config
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 403 | Evaluation: 100 (20% split)
Training: 390 | Evaluation: 100
Labels: ner (1)
[2021-12-02 22:19:14,106] [INFO] Pipeline: ['transformer', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']
[2021-12-02 22:19:14,106] [INFO] Resuming training for: ['ner', 'transformer']
[2021-12-02 22:19:14,113] [INFO] Created vocabulary
[2021-12-02 22:19:14,138] [INFO] Finished initializing nlp object
[2021-12-02 22:19:14,138] [INFO] Initialized pipeline components: []
✔ Initialized pipeline
============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 403 | Evaluation: 100 (20% split)
Training: 390 | Evaluation: 100
Labels: ner (1)
ℹ Pipeline: ['transformer', 'tagger', 'parser', 'attribute_ruler',
'lemmatizer', 'ner']
ℹ Frozen components: ['tagger', 'parser', 'attribute_ruler',
'lemmatizer']
ℹ Initial learn rate: 0.0
E # LOSS TRANS... LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------- -------- ------ ------ ------ ------
⚠ Aborting and saving the final best model. Encountered exception:
RuntimeError('CUDA out of memory. Tried to allocate 578.00 MiB (GPU 1; 14.76 GiB
total capacity; 11.46 GiB already allocated; 57.75 MiB free; 12.01 GiB reserved
in total by PyTorch)')
Traceback (most recent call last):
File "/fn/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/fn/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/prodigy/__main__.py", line 61, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 331, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/prodigy/recipes/train.py", line 277, in train
return _train(
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/prodigy/recipes/train.py", line 197, in _train
spacy_train(nlp, output_path, use_gpu=gpu_id, stdout=stdout)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/training/loop.py", line 122, in train
raise e
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/training/loop.py", line 105, in train
for batch, info, is_best_checkpoint in training_step_iterator:
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/training/loop.py", line 224, in train_while_improving
score, other_scores = evaluate()
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/training/loop.py", line 281, in evaluate
scores = nlp.evaluate(dev_corpus(nlp))
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/language.py", line 1409, in evaluate
for doc, eg in zip(
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/util.py", line 1599, in _pipe
yield from proc.pipe(docs, **kwargs)
File "spacy/pipeline/trainable_pipe.pyx", line 79, in pipe
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy/util.py", line 1618, in raise_error
raise e
File "spacy/pipeline/trainable_pipe.pyx", line 75, in spacy.pipeline.trainable_pipe.TrainablePipe.pipe
File "spacy/pipeline/tagger.pyx", line 141, in spacy.pipeline.tagger.Tagger.predict
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/model.py", line 315, in predict
return self._func(self, X, is_train=False)[0]
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/layers/chain.py", line 54, in forward
Y, inc_layer_grad = layer(X, is_train=is_train)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/model.py", line 291, in __call__
return self._func(self, X, is_train=is_train)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/layers/chain.py", line 54, in forward
Y, inc_layer_grad = layer(X, is_train=is_train)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/model.py", line 291, in __call__
return self._func(self, X, is_train=is_train)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/layers/chain.py", line 54, in forward
Y, inc_layer_grad = layer(X, is_train=is_train)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/model.py", line 291, in __call__
return self._func(self, X, is_train=is_train)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/spacy_transformers/layers/transformer_model.py", line 185, in forward
model_output, bp_tensors = transformer(wordpieces, is_train)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/model.py", line 291, in __call__
return self._func(self, X, is_train=is_train)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/layers/pytorchwrapper.py", line 134, in forward
Ytorch, torch_backprop = model.shims[0](Xtorch, is_train)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/shims/pytorch.py", line 56, in __call__
return self.predict(inputs), lambda a: ...
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/thinc/shims/pytorch.py", line 66, in predict
outputs = self._model(*inputs.args, **inputs.kwargs)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 798, in forward
encoder_outputs = self.encoder(
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 498, in forward
layer_outputs = layer_module(
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 393, in forward
self_attention_outputs = self.attention(
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 321, in forward
self_outputs = self.self(
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yzhong/.virtualenvs/research/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 257, in forward
context_layer = torch.matmul(attention_probs, value_layer)
RuntimeError: CUDA out of memory. Tried to allocate 578.00 MiB (GPU 1; 14.76 GiB total capacity; 11.46 GiB already allocated; 57.75 MiB free; 12.01 GiB reserved in total by PyTorch)