spancat out of memory

mwade-noetic · March 22, 2022, 3:51am

I am trying to train a span category and it runs out of memory before any actual training starts. I am using the latest version of spacy 3.2.3 and prodigy 1.11.7. Additionally I have install GPU support (cu113). I have annotated a set of 240 span cats and when I run the training using this command:

My system has 64GB of RAM with 32GB of swap. GPU has 24GB (RTX 3090).

prodigy train ./prod_span_models --spancat test_dataset --base-model en_core_web_lg --eval-split 0.20 --label-stats --verbose

I end up with the following error:

File "/home/mwade/PycharmProjects/DRD_MultiCat_Model/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 54, in forward
Y, inc_layer_grad = layer(X, is_train=is_train)
File "/home/mwade/PycharmProjects/DRD_MultiCat_Model/venv/lib/python3.9/site-packages/thinc/model.py", line 291, in call
return self._func(self, X, is_train=is_train)
File "/home/mwade/PycharmProjects/DRD_MultiCat_Model/venv/lib/python3.9/site-packages/spacy/ml/extract_spans.py", line 32, in forward
Y = Ragged(X.dataXd[indices], spans.dataXd[:, 1] - spans.dataXd[:, 0]) # type: ignore[arg-type, index]
numpy.core._exceptions.MemoryError: Unable to allocate 221. GiB for an array with shape (617037108, 96) and data type float32

Doesn't matter whether I use GPU or not.

Some of my spancats can get large (I get a list of names followed by a fixed word/phrase such as: "Defendants" or "Plaintiffs". Usually it is a fairly small list but it can grow to 10-20 names (20-60 tokens)
Is that the reason for my memory error? Anyway to debug this to know if there is something specific causing this, or how to work around?

Thanks,

Michael Wade

ines · March 25, 2022, 10:47am

It's possible that this is related to the suggester function, which by default, will use an ngram range of all the available spans lengths in the data. So if you have really long spans, you'll end up with a lot of potential candidates (e.g. all possible spans between 1 and 60 tokens, which can be a lot). If you run prodigy train with the --verbose flag, it should show you more detailed information on the suggester function used: Span Categorization · Prodigy · An annotation tool for AI, Machine Learning & NLP

One option to prevent this would be to use a config that defines a different logic for potential span candidates via the suggester function: SpanCategorizer · spaCy API Documentation How you set this up depends on the data, but there might be common patterns that you can use instead of considering every possible combination.

The suggester functions can also integrate with Prodigy during annotation so you can ensure that only spans matching the suggester can be selected: Span Categorization · Prodigy · An annotation tool for AI, Machine Learning & NLP

kylebigelow · March 26, 2022, 11:03pm

Still new and learning the fundamentals but as a temporary resource solution you can throw capacity at it and get a spot Azure VM and test until the your system is optimized.

I have a spot instance (US-East 2) that I can use a ND96amsr_A100_v4 for ~$13-15/hr that includes:

8 A100 GPUs
96vCPUs
1924 GB Memory
2900 GB Temp storage

Economies of scale

kylebigelow · April 24, 2022, 3:25am

@ines I recently encountered a similar issue and setting the max size for the suggester to 60 (from 90) worked! 10.1/12.0 GB GPU memory in use for 50 different job postings with varying lengths.

@mwade-noetic I also set my gpu_allocator to "pytorch" so try that as well if you have not already.

Topic		Replies	Views
any solution for this issue even after i've changed batch size its not working usage , spacy , training , spancat	9	881	June 23, 2022
prodigy train OutOfMemoryError	3	475	November 16, 2022
Way to get an estimate Memory consumption of spancat model spancat	1	278	November 9, 2022
Train spancat bug spacy , training , spancat	7	557	October 12, 2021
training long sequence on spancat memory problem spancat	1	392	March 29, 2023

spancat out of memory

Related topics