any solution for this issue even after i've changed batch size its not working

Harsh_chavhan · June 20, 2022, 1:17pm

Components: spancat
Merging training and evaluation data for 1 components

[spancat] Training: 132 | Evaluation: 33 (20% split)
Training: 132 | Evaluation: 33
Labels: spancat (5)

Pipeline: ['tok2vec', 'spancat']
Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS SPANCAT SPANS_SC_F SPANS_SC_P SPANS_SC_R SCORE

Aborting and saving the final best model. Encountered exception:
OutOfMemoryError('Out of memory allocating 1,876,703,232 bytes (allocated so
far: 6,663,892,992 bytes).')
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/harsh/.local/lib/python3.8/site-packages/prodigy/main.py", line 61, in
controller = recipe(args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/home/harsh/.local/lib/python3.8/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/home/harsh/.local/lib/python3.8/site-packages/plac_core.py", line 232, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File "/home/harsh/.local/lib/python3.8/site-packages/prodigy/recipes/train.py", line 278, in train
return _train(
File "/home/harsh/.local/lib/python3.8/site-packages/prodigy/recipes/train.py", line 198, in _train
spacy_train(nlp, output_path, use_gpu=gpu_id, stdout=stdout)
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/training/loop.py", line 122, in train
raise e
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/training/loop.py", line 105, in train
for batch, info, is_best_checkpoint in training_step_iterator:
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/training/loop.py", line 203, in train_while_improving
nlp.update(
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/language.py", line 1164, in update
proc.update(examples, sgd=None, losses=losses, **component_cfg[name]) # type: ignore
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/pipeline/spancat.py", line 346, in update
backprop_scores(d_scores) # type: ignore
File "/home/harsh/.local/lib/python3.8/site-packages/thinc/layers/chain.py", line 60, in backprop
dX = callback(dY)
File "/home/harsh/.local/lib/python3.8/site-packages/thinc/layers/chain.py", line 60, in backprop
dX = callback(dY)
File "/home/harsh/.local/lib/python3.8/site-packages/thinc/layers/concatenate.py", line 67, in backprop
gradient = bwd(dY)
File "/home/harsh/.local/lib/python3.8/site-packages/thinc/layers/reduce_mean.py", line 26, in backprop
return Ragged(model.ops.backprop_reduce_mean(dY, lengths), lengths)
File "/home/harsh/.local/lib/python3.8/site-packages/thinc/backends/cupy_ops.py", line 235, in backprop_reduce_mean
return _custom_kernels.backprop_reduce_mean(d_means, lengths)
File "/home/harsh/.local/lib/python3.8/site-packages/thinc/backends/_custom_kernels.py", line 318, in backprop_reduce_mean
out = cupy.zeros((T, O), dtype="f")
File "/home/harsh/.local/lib/python3.8/site-packages/cupy/_creation/basic.py", line 211, in zeros
a = cupy.ndarray(shape, dtype, order=order)
File "cupy/_core/core.pyx", line 171, in cupy._core.core.ndarray.init
File "cupy/cuda/memory.pyx", line 698, in cupy.cuda.memory.alloc
File "cupy/cuda/memory.pyx", line 1375, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1396, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1076, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1097, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
File "cupy/cuda/memory.pyx", line 1335, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 1,876,703,232 bytes (allocated so far: 6,663,892,992 bytes).

koaning · June 20, 2022, 2:12pm

Hi there.

Could you share the commands that you ran? Without understanding what you tried to run, it's hard to drill deeper and understand what went wrong.

From looking at the traceback it seems like you're hitting an out of memory issue. Are you trying to run on a GPU? Could you share your operating system, Python, spaCy and Prodigy versions?

Harsh_chavhan · June 20, 2022, 2:14pm

python3 -m prodigy train "./model_text_span_20_jun/" --spancat text-span-softskill --gpu-id 0 --verbose

Python = 3.8.10
spacy = spaCy v3.3.1

koaning · June 20, 2022, 2:21pm

How large is your dataset? Do you have very long examples? It's suggested on GitHub that you may get it to run by making the batch size smaller and/or set the max_batch_items smaller.

You can override the config in the Prodigy train command as you would in the spaCy train command. You might try running it again with --training.batch_size 8 at the end. Does that help?

Also, do you see the same error if you don't connect a GPU?

Harsh_chavhan · June 21, 2022, 6:48am

no its showing me this error after changing batch size

=========================== Initializing pipeline ===========================
✘ Config validation error
training -> batch_size extra fields not permitted

{'dev_corpus': 'corpora.dev', 'train_corpus': 'corpora.train', 'seed': 0, 'gpu_allocator': None, 'dropout': 0.1, 'accumulate_gradient': 1, 'patience': 1600, 'max_epochs': 0, 'max_steps': 20000, 'eval_frequency': 200, 'frozen_components': , 'annotating_components': , 'before_to_disk': {'@misc': 'prodigy.todisk_cleanup.v1'}, 'logger': {'@loggers': 'prodigy.ConsoleLogger.v1'}, 'batch_size': 8, 'batcher': {'@batchers': 'spacy.batch_by_words.v1', 'discard_oversize': False, 'tolerance': 0.2, 'get_length': None, 'size': {'@schedules': 'compounding.v1', 'start': 100, 'stop': 1000, 'compound': 1.001, 't': 0.0}}, 'optimizer': {'@optimizers': 'Adam.v1', 'beta1': 0.9, 'beta2': 0.999, 'L2_is_weight_decay': True, 'L2': 0.01, 'grad_clip': 1.0, 'use_averages': False, 'eps': 1e-08, 'learn_rate': 0.001}, 'score_weights': {'spans_sc_f': 1.0, 'spans_sc_p': 0.0, 'spans_sc_r': 0.0}}

koaning · June 21, 2022, 7:01am

Ah! My bad. Could you try:

--training.batcher.size.start 8

I just checked the default configuration and these are the available settings for the batch size:

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001

Harsh_chavhan · June 21, 2022, 8:42am

Thanks @koaning Vincent for helping me but, it's still not working on GPU because my span size is 96 and again i'm attaching my error for your reference

python3 -m prodigy train "./span_20_jun_2022_v1/" --spancat text-span-softskill --gpu-id 0 --training.batcher.size.start 8
Using GPU: 0
/home/harsh/.local/lib/python3.8/site-packages/torch/cuda/init.py:145: UserWarning:
NVIDIA GeForce RTX 3070 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3070 Ti GPU with PyTorch, please check the instructions at Start Locally | PyTorch

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

========================= Generating Prodigy config =========================
Auto-generating config with spaCy
Using 'spacy.ngram_range_suggester.v1' for 'spancat' with sizes 1 to 96 (inferred from data)
Generated training config

=========================== Initializing pipeline ===========================
[2022-06-21 13:56:02,879] [INFO] Set up nlp object from config
Components: spancat
Merging training and evaluation data for 1 components

[spancat] Training: 148 | Evaluation: 36 (20% split)
Training: 148 | Evaluation: 36
Labels: spancat (5)
[2022-06-21 13:56:03,231] [INFO] Pipeline: ['tok2vec', 'spancat']
[2022-06-21 13:56:03,233] [INFO] Created vocabulary
[2022-06-21 13:56:03,233] [INFO] Finished initializing nlp object
[2022-06-21 13:56:05,770] [INFO] Initialized pipeline components: ['tok2vec', 'spancat']
Initialized pipeline

============================= Training pipeline =============================
Components: spancat
Merging training and evaluation data for 1 components

[spancat] Training: 148 | Evaluation: 36 (20% split)
Training: 148 | Evaluation: 36
Labels: spancat (5)
Pipeline: ['tok2vec', 'spancat']
Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS SPANCAT SPANS_SC_F SPANS_SC_P SPANS_SC_R SCORE

Aborting and saving the final best model. Encountered exception:
OutOfMemoryError('Out of memory allocating 2,098,403,328 bytes (allocated so
far: 186,237,952 bytes).')
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/harsh/.local/lib/python3.8/site-packages/prodigy/main.py", line 61, in
controller = recipe(args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/home/harsh/.local/lib/python3.8/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/home/harsh/.local/lib/python3.8/site-packages/plac_core.py", line 232, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File "/home/harsh/.local/lib/python3.8/site-packages/prodigy/recipes/train.py", line 278, in train
return _train(
File "/home/harsh/.local/lib/python3.8/site-packages/prodigy/recipes/train.py", line 198, in _train
spacy_train(nlp, output_path, use_gpu=gpu_id, stdout=stdout)
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/training/loop.py", line 122, in train
raise e
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/training/loop.py", line 105, in train
for batch, info, is_best_checkpoint in training_step_iterator:
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/training/loop.py", line 203, in train_while_improving
nlp.update(
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/language.py", line 1164, in update
proc.update(examples, sgd=None, losses=losses, **component_cfg[name]) # type: ignore
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/pipeline/spancat.py", line 344, in update
scores, backprop_scores = self.model.begin_update((docs, spans))
File "/home/harsh/.local/lib/python3.8/site-packages/thinc/model.py", line 309, in begin_update
return self._func(self, X, is_train=True)
File "/home/harsh/.local/lib/python3.8/site-packages/thinc/layers/chain.py", line 54, in forward
Y, inc_layer_grad = layer(X, is_train=is_train)
File "/home/harsh/.local/lib/python3.8/site-packages/thinc/model.py", line 291, in call
return self._func(self, X, is_train=is_train)
File "/home/harsh/.local/lib/python3.8/site-packages/spacy/ml/extract_spans.py", line 32, in forward
Y = Ragged(X.dataXd[indices], spans.dataXd[:, 1] - spans.dataXd[:, 0]) # type: ignore[arg-type, index]
File "cupy/_core/core.pyx", line 1437, in cupy._core.core.ndarray.getitem
File "cupy/_core/_routines_indexing.pyx", line 43, in cupy._core._routines_indexing._ndarray_getitem
File "cupy/_core/core.pyx", line 782, in cupy._core.core.ndarray.take
File "cupy/_core/_routines_indexing.pyx", line 144, in cupy._core._routines_indexing._ndarray_take
File "cupy/_core/_routines_indexing.pyx", line 834, in cupy._core._routines_indexing._take
File "cupy/_core/core.pyx", line 171, in cupy._core.core.ndarray.init
File "cupy/cuda/memory.pyx", line 698, in cupy.cuda.memory.alloc
File "cupy/cuda/memory.pyx", line 1375, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1396, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1076, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1097, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
File "cupy/cuda/memory.pyx", line 1335, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 2,098,403,328 bytes (allocated so far: 186,237,952 bytes). @ines

koaning · June 22, 2022, 8:41am

Please put code/tracebacks in code blocks (by wrapping them in ```). That way, it's easier to look through them and they will also become scrollable, saving screen real estate.

It's becoming more and more clear that this is perhaps less of a Prodigy issue and more of a spaCy/Pytorch issue. One question, a span of size 96 feels very large. Are you using a custom config? Your previous commands suggested that you weren't and the standard config has these span sizes:

[components.spancat.suggester]
@misc = "spacy.ngram_suggester.v1"
sizes = [1,2,3]

Is there a reason why your span size really needs to be that big?

Harsh_chavhan · June 22, 2022, 9:03am

okay is there any way to limit those large span size while annotating ???

koaning · June 23, 2022, 9:54am

I'm perhaps a bit confused. My impression is that you've been annotating the data such that there are spans that have 96 tokens in them. If you want to prevent this, you should not create large spans when you annotate.

Do you have an example of such a large span? Are you using a custom config?

Topic		Replies	Views
prodigy train OutOfMemoryError	3	475	November 16, 2022
spancat out of memory training , spancat	3	1041	April 24, 2022
Error on saving model from textcat.batch-train textcat , spacy	1	1406	December 29, 2017
Train spancat bug spacy , training , spancat	7	557	October 12, 2021
Unable to use train and run data-to-spacy recipes for spancat on prodigy 1.11.10 solved , spancat	4	872	May 4, 2023

any solution for this issue even after i've changed batch size its not working

Related topics