Ner.batch-train "python stopped working"

I’m getting consistent crashes with ner.batch-train. It shows about 8,000 examples for training and verbose logging doesn’t show any potential issues.

When I choose to debug python, I get this error

Unhandled exception at 0x00007FFC55EB1E92 (ner.cp36-win_amd64.pyd) in python.exe: 0xC0000005: Access violation reading location 0x0000024C06814FF4.

Any ideas?

Hmm that’s not easy :(. Could you set batch size to one and print out the examples as it trains, to find out which text it crashes on?

Well, it hasn’t been crashing with a batch size of 1. I’ll keep an eye on it and post here if I can gather more info.

Thanks!

Has there been any insights on this issue so far?
I have a similar problem

  • ner.batch-train stops python from working on average around every 3rd iteration
  • with batch size 1 i didn’t have a crash yet
  • started again from scratch to build a new dataset, the same error occured

@Ben Hmm! What happens if you set batch size to 2? Also, could you keep an eye on memory usage, to check whether that could be the problem?

Now I did some further testing. After some more iterations it also crashed with batch size 1 (sorry for the confusion). I closely monitored the memory. It was quite constant at around 70% with small peaks of 80% after each iteration. The crashes usually occur during the iterations, so not during the peak time.

Hmm. I know it’s possible to compile everything so that it’s possible to get a core dump, which would let us get a stack trace. I never actually use those tools though — largely because the stack trace doesn’t necessarily tell you when the first error was. Still, maybe this is a situation where it’d be good to have that…

Could you set the batch size to 1, and log the sentences? You’ll want to write to a file, and be sure to set the bufsize argument to 1 (see here: https://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file ). We need to set the bufsize to make sure we get the last write. Otherwise, the program may crash with the critical sentence waiting to be written.

Hopefully we’ll see it crashing on the same sentence each time. If not, that means the crash depends on both the sentence and the state of the weights, which will make things more difficult.

Thanks for your patience on this!

I did not fix this, but have been working with model which went through 4 iterations once. Just an update:

I switched from Windows to Mac and with the same dataset there is no error anymore.

I have the same problem when I train whit --unsegmented, always crash independently on the texts and didn’t finish the first iteration , I try whit batch size 1 and open the beam width to 1024, 2048, up to 5096. The memory usage is not so high, only training with less than 500 examples and the size of the annotations are between 400 to 1000 tokens.
Using prodigy 1.6.1, spacy 2.0.16, thinc 6.12.0 on Windows 10

@AlejandroJCR Could you try pip install -U spacy==2.0.17.dev0 ? I fixed an out of bounds access that I think might be the cause of your problem. We should have 2.0.17 out shortly, but in the meantime you can try the development build.

Matt aparently you have fix both the ner training in prodigy whit long texts, it’s working now thanks and also in spacy it’s working ok, I was exporting the annotations and tried to train whit the standard greedy method but I was geeting a weird behavior (Pycharm error: “Process finished with exit code -1073741819 (0xC0000005)”), I try to train with both now and they are working ok. Thanks a lot for your ultra-fast response.