Now I did some further testing. After some more iterations it also crashed with batch size 1 (sorry for the confusion). I closely monitored the memory. It was quite constant at around 70% with small peaks of 80% after each iteration. The crashes usually occur during the iterations, so not during the peak time.
Hmm. I know it’s possible to compile everything so that it’s possible to get a core dump, which would let us get a stack trace. I never actually use those tools though — largely because the stack trace doesn’t necessarily tell you when the first error was. Still, maybe this is a situation where it’d be good to have that…
Could you set the batch size to 1, and log the sentences? You’ll want to write to a file, and be sure to set the bufsize argument to 1 (see here: https://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file ). We need to set the bufsize to make sure we get the last write. Otherwise, the program may crash with the critical sentence waiting to be written.
Hopefully we’ll see it crashing on the same sentence each time. If not, that means the crash depends on both the sentence and the state of the weights, which will make things more difficult.
I have the same problem when I train whit --unsegmented, always crash independently on the texts and didn’t finish the first iteration , I try whit batch size 1 and open the beam width to 1024, 2048, up to 5096. The memory usage is not so high, only training with less than 500 examples and the size of the annotations are between 400 to 1000 tokens.
Using prodigy 1.6.1, spacy 2.0.16, thinc 6.12.0 on Windows 10
@AlejandroJCR Could you try pip install -U spacy==2.0.17.dev0 ? I fixed an out of bounds access that I think might be the cause of your problem. We should have 2.0.17 out shortly, but in the meantime you can try the development build.
Matt aparently you have fix both the ner training in prodigy whit long texts, it’s working now thanks and also in spacy it’s working ok, I was exporting the annotations and tried to train whit the standard greedy method but I was geeting a weird behavior (Pycharm error: “Process finished with exit code -1073741819 (0xC0000005)”), I try to train with both now and they are working ok. Thanks a lot for your ultra-fast response.