I recently hit random crashes when I train NER model on a slightly larger dataset (~2000 annotations). It usually crashes after ~20 iterations, always in ner.cp36-win_amd64.pyd.
Decrease batch size or iteration, or simply rerun the training process sometimes solves the issue.
Before each run of Prodigy, it complains about numpy data structure size mismatch.
I suspect this is because I have latest version of all Python libraries installed, which is different from the version Prodigy is compiled against.
Could you post the exact versions of all Prodigy dependencies (i.e. versions used in development / testing process), so I could install these known good versions?
The numpy warning is not problematic. Unfortunately the warning is raised when the versions of numpy used at runtime and buildtime differ. You can get rid of the warning if you do
pip install "numpy<1.15.0". We’ve avoided pinning to these earlier versions of numpy because some other libraries may pin to versions greater than 1.15.0. The warnings are annoying, but dependency hell is much worse. The next release of spaCy suppresses the warning.
We’re still working on figuring out why this intermittent segfault occurs. If you have a batch of examples it fails on persistently, that would be very helpful. In the meantime, we appreciate your patience.
If you PM me an email address I can send you a copy of my prodigy database.
Could you extract the dataset with
prodigy db-out, and email the jsonl file to firstname.lastname@example.org ?