Ner.batch-train "python stopped working"

farlee2121 · February 27, 2018, 7:37pm

I’m getting consistent crashes with ner.batch-train. It shows about 8,000 examples for training and verbose logging doesn’t show any potential issues.

When I choose to debug python, I get this error

Unhandled exception at 0x00007FFC55EB1E92 (ner.cp36-win_amd64.pyd) in python.exe: 0xC0000005: Access violation reading location 0x0000024C06814FF4.

Any ideas?

honnibal · February 27, 2018, 9:55pm

Hmm that’s not easy :(. Could you set batch size to one and print out the examples as it trains, to find out which text it crashes on?

farlee2121 · March 1, 2018, 5:05pm

Well, it hasn’t been crashing with a batch size of 1. I’ll keep an eye on it and post here if I can gather more info.

Thanks!

Ben · July 2, 2018, 1:48pm

Has there been any insights on this issue so far?
I have a similar problem

ner.batch-train stops python from working on average around every 3rd iteration
with batch size 1 i didn’t have a crash yet
started again from scratch to build a new dataset, the same error occured

honnibal · July 3, 2018, 12:40pm

@Ben Hmm! What happens if you set batch size to 2? Also, could you keep an eye on memory usage, to check whether that could be the problem?

Ben · July 3, 2018, 10:21pm

Now I did some further testing. After some more iterations it also crashed with batch size 1 (sorry for the confusion). I closely monitored the memory. It was quite constant at around 70% with small peaks of 80% after each iteration. The crashes usually occur during the iterations, so not during the peak time.

honnibal · July 4, 2018, 10:17am

Hmm. I know it’s possible to compile everything so that it’s possible to get a core dump, which would let us get a stack trace. I never actually use those tools though — largely because the stack trace doesn’t necessarily tell you when the first error was. Still, maybe this is a situation where it’d be good to have that…

Could you set the batch size to 1, and log the sentences? You’ll want to write to a file, and be sure to set the bufsize argument to 1 (see here: https://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file ). We need to set the bufsize to make sure we get the last write. Otherwise, the program may crash with the critical sentence waiting to be written.

Hopefully we’ll see it crashing on the same sentence each time. If not, that means the crash depends on both the sentence and the state of the weights, which will make things more difficult.

Thanks for your patience on this!

Ben · November 2, 2018, 6:45pm

I did not fix this, but have been working with model which went through 4 iterations once. Just an update:

I switched from Windows to Mac and with the same dataset there is no error anymore.

AlejandroJCR · November 13, 2018, 9:40am

I have the same problem when I train whit --unsegmented, always crash independently on the texts and didn’t finish the first iteration , I try whit batch size 1 and open the beam width to 1024, 2048, up to 5096. The memory usage is not so high, only training with less than 500 examples and the size of the annotations are between 400 to 1000 tokens.
Using prodigy 1.6.1, spacy 2.0.16, thinc 6.12.0 on Windows 10

honnibal · November 13, 2018, 2:06pm

@AlejandroJCR Could you try pip install -U spacy==2.0.17.dev0 ? I fixed an out of bounds access that I think might be the cause of your problem. We should have 2.0.17 out shortly, but in the meantime you can try the development build.

AlejandroJCR · November 13, 2018, 3:27pm

Matt aparently you have fix both the ner training in prodigy whit long texts, it’s working now thanks and also in spacy it’s working ok, I was exporting the annotations and tried to train whit the standard greedy method but I was geeting a weird behavior (Pycharm error: “Process finished with exit code -1073741819 (0xC0000005)”), I try to train with both now and they are working ok. Thanks a lot for your ultra-fast response.

Topic		Replies	Views
Command "ner.batch-train" returns MemoryError ner , solved	5	826	August 22, 2019
ner.batch-train random Python has stopped/Segmentation Fault ner , done , windows	1	572	September 24, 2018
Segmentation fault when using ner.batch-train done , spacy	1	508	June 12, 2018
Segmentation fault (intermittent) done , spacy	10	2058	August 30, 2018
Segmentation Fault ner , done , spacy	10	1810	April 26, 2019

Ner.batch-train "python stopped working"

Related topics