ner.batch-train random Python has stopped/Segmentation Fault

When training a model with ner.batch-train on both window and linux, the process fails randomly. On linux this results in a segmentation fault error, on Windows a popup tells me python has stopped. I have checked that I am not running out of ram. Batch size does not appear to have any impact on this issue, though I have no run it for multiple iterations with a batch size of 1 due to how slow it is.

I am unable to find any trigger for the problem. Sometimes training is able to complete 10+ epochs without issues, other times a segmentation fault occurs in the first few epochs, or even before completion of the first epoch.

EDIT: I have now ran ner.batch-train with a batch size of 1, as suggested in some related topics. It still results in a segmentation fault.

This is most likely a bug in spaCy, which I think I’m getting close to resolving. But if you have a chance, see if you can find a single document that it fails consistently on? If you do, and you’re able to email me the document, it would definitely be helpful. Otherwise, I’ll update this thread when I know more.