Segmentation fault (intermittent)

cbrew · October 21, 2017, 7:13pm

Intermittent segmentation fault with
prodigy ner.batch_train ner_product en_core_web_sm --output vuln_model --n-iter 10 --eval-split 0.2 --dropout 0.2
when ner_product has about 5k examples taken from malware vulnerability reports

```

/Users/cbrew/anaconda3/bin/prodigy: line 1: 53104 Segmentation fault: 11 python -m
prodigy “$@”

honnibal · October 21, 2017, 7:22pm

Thanks for the report.

Could it be running out of memory? I think I have some calls to malloc where I haven’t checked the error code.

cbrew · October 21, 2017, 7:27pm

Yes. Could be memory issues. I see two python 3.5 processes, one with about 9Gb one with about 5Gb, and the machine has 16Gb RAM (mac OS Sierra 10.12.6)

honnibal · October 21, 2017, 7:39pm

That’s rather more memory than we should be using, but it’ sort of unsurprising.

Try to set your batch size lower? There are a number of places where we accumulate a batch out of the stream, so we can end up with some multiple of the batch-size in memory.

Both of these problems should be fixed for the next spacy-nightly (which we’re hoping to promote to v2.0.0rc1). I made some nice improvements to the speed and memory usage of the v2 parser this week: https://github.com/explosion/spaCy/pull/1438 . I’ve also opened an issue about the status code checks: https://github.com/explosion/spaCy/issues/1446

cbrew · October 22, 2017, 1:05am

Batch size was 32, and running for 40 iterations finished up with a memory image of 29Gb.
So it looks like some kind of systematic memory leak to me.
But it did complete without segfaulting this time.

honnibal · November 13, 2017, 4:26pm

Found a memory leak in the beam parsing code that looks relevant. The ParserBeam has a destructor that was supposed to clean up the state objects, but I think there’s a cycle and it’s never collected. Additionally, if it is collected, it looks like it would free the state objects too soon — which would explain the occasional segfaults.

It could be that I’ve fixed a different error, and the bug you’ve hit still persist. But I’ll mark this resolved, because I do think the one I’ve fixed is the relevant one.

sooheon · June 30, 2018, 6:31pm

Just fyi that I’m hitting this same segfault msg intermittently with latest prodigy version. Let me know if I can do anything to help track it down

honnibal · July 3, 2018, 12:39pm

@sooheon Thanks for the report! Are you able to find a sentence where it consistently fails?

hdatteln · July 23, 2018, 10:38am

I am also getting this error during ner batch-training (using prodigy version 1.5.1) . The bigger my annotation dataset, the more often the error seems to occur.

aniruddha · August 25, 2018, 4:38am

Suffering as well. The bigger my dataset goes, the the more frequently segmentation fault occurs. At this point with 11000 examples on my NER dataset and 16GB memory, I cannot train anything!

free(): invalid size

[1] 18484 segmentation fault (core dumped) python -m prodigy ner.batch-train .....

Fatal Python error: GC object already tracked

Thread 0x00007f4232a22700 (most recent call first):
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/threading.py", line 299 in wait
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/threading.py", line 551 in wait
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tqdm/_monitor.py", line 68 in run
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007f429e6c9080 (most recent call first):
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 442 in batch_train
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/plac_core.py", line 207 in consume
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/plac_core.py", line 328 in call
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/prodigy/__main__.py", line 259 in <module>
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/runpy.py", line 85 in _run_code
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/runpy.py", line 193 in _run_module_as_main
[1]    17112 abort (core dumped)  python -m prodigy ner.batch-train .....

honnibal · August 30, 2018, 9:24am

@aniruddha Try setting a lower batch size, and perhaps also a lower beam-width. We’re working on addressing these memory-usage issues within spaCy. Thanks for your patience – I know it’s not very satisfying!

Topic		Replies	Views
Segmentation Fault ner , done , spacy	10	1812	April 26, 2019
Segmentation fault when using ner.batch-train done , spacy	1	511	June 12, 2018
Ner.batch-train "python stopped working" ner , done , spacy	10	1067	November 13, 2018
NER training - high RAM usage - Memory leak ? ner , done , spacy	5	1292	July 5, 2018
ner.batch-train random Python has stopped/Segmentation Fault ner , done , windows	1	573	September 24, 2018

Segmentation fault (intermittent)

Related topics