So after building a collection of annotations in my database, I proceeded to run ner.batch-train
in my PyCharm terminal, like so:
python -m prodigy ner.batch-train annotations en_core_web_lg --output addresses --n-iter 5 --label ADDR --batch-size 16 --eval-split 0.2
However, before the first or second iteration is complete, the process would suddenly stop terminating without an exception/error message being displayed in the terminal. The terminal would just be ready to accept new input, as if no process is running.
Following similar advice in older threads, I tried to run it with a batch-size
of 1. However, this wasn’t a sustainable solution, as the training speed slowed down by a factor of 10, and it took more than an hour just to process an iteration. (For the record, I canceled it before completion as it was taking too long and it wasn’t prudent use of debugging time, although the process never terminated by the end of the first fold)
Seeing as there are similar unresolved threads to this issue, I’m wondering:
- Is there a consensus as to why this termination is happening? It seems work fine if I push the code to my Mac.
- What exactly is it that
batch-size
does, that leads you to believe it to be the solution to this problem. - How does one trigger the debugging output on Windows in the PyCharm terminal? Putting
export PRODIGY_LOGGING=basic
in my venv’sactivate
file did not seem to do anything. - Is there a fix that does not happen to induce unreasonable performance hits?
Any help is welcome.