Segmentation fault (intermittent)

Intermittent segmentation fault with
prodigy ner.batch_train ner_product en_core_web_sm --output vuln_model --n-iter 10 --eval-split 0.2 --dropout 0.2
when ner_product has about 5k examples taken from malware vulnerability reports

```

/Users/cbrew/anaconda3/bin/prodigy: line 1: 53104 Segmentation fault: 11 python -m
prodigy “$@”

Thanks for the report.

Could it be running out of memory? I think I have some calls to malloc where I haven’t checked the error code.

Yes. Could be memory issues. I see two python 3.5 processes, one with about 9Gb one with about 5Gb, and the machine has 16Gb RAM (mac OS Sierra 10.12.6)

That’s rather more memory than we should be using, but it’ sort of unsurprising.

Try to set your batch size lower? There are a number of places where we accumulate a batch out of the stream, so we can end up with some multiple of the batch-size in memory.

Both of these problems should be fixed for the next spacy-nightly (which we’re hoping to promote to v2.0.0rc1). I made some nice improvements to the speed and memory usage of the v2 parser this week: https://github.com/explosion/spaCy/pull/1438 . I’ve also opened an issue about the status code checks: https://github.com/explosion/spaCy/issues/1446

Batch size was 32, and running for 40 iterations finished up with a memory image of 29Gb.
So it looks like some kind of systematic memory leak to me.
But it did complete without segfaulting this time.

Found a memory leak in the beam parsing code that looks relevant. The ParserBeam has a destructor that was supposed to clean up the state objects, but I think there’s a cycle and it’s never collected. Additionally, if it is collected, it looks like it would free the state objects too soon — which would explain the occasional segfaults.

It could be that I’ve fixed a different error, and the bug you’ve hit still persist. But I’ll mark this resolved, because I do think the one I’ve fixed is the relevant one.

Just fyi that I’m hitting this same segfault msg intermittently with latest prodigy version. Let me know if I can do anything to help track it down

@sooheon Thanks for the report! Are you able to find a sentence where it consistently fails?

I am also getting this error during ner batch-training (using prodigy version 1.5.1) . The bigger my annotation dataset, the more often the error seems to occur.

Suffering as well. The bigger my dataset goes, the the more frequently segmentation fault occurs. At this point with 11000 examples on my NER dataset and 16GB memory, I cannot train anything!

free(): invalid size

[1] 18484 segmentation fault (core dumped) python -m prodigy ner.batch-train .....

Fatal Python error: GC object already tracked

Thread 0x00007f4232a22700 (most recent call first):
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/threading.py", line 299 in wait
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/threading.py", line 551 in wait
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tqdm/_monitor.py", line 68 in run
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007f429e6c9080 (most recent call first):
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 442 in batch_train
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/plac_core.py", line 207 in consume
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/plac_core.py", line 328 in call
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/site-packages/prodigy/__main__.py", line 259 in <module>
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/runpy.py", line 85 in _run_code
  File "/home/aniruddha/.pyenv/versions/3.6.5/lib/python3.6/runpy.py", line 193 in _run_module_as_main
[1]    17112 abort (core dumped)  python -m prodigy ner.batch-train ..... 

@aniruddha Try setting a lower batch size, and perhaps also a lower beam-width. We’re working on addressing these memory-usage issues within spaCy. Thanks for your patience – I know it’s not very satisfying!