ner.batch-train is really slow


i have created a dataset “email” with around 2k annotations, then i have tried batch-train without success, it is really slow:

C:\Users\Damiano>python -m prodigy ner.batch-train email it_core_news_sm --output C:\Users\Damiano\Model --n-iter 10 --eval-split 0.2 --dropout 0.2

Loaded model it_core_news_sm
Using 20% of accept/reject examples (226) for evaluation
Using 100% of remaining examples (916) for training
Dropout: 0.2  Batch size: 32  Iterations: 10

BEFORE     0.000
Correct    0
Incorrect  39
Entities   3827
Unknown    3822

#          LOSS       RIGHT      WRONG      ENTS       SKIP       ACCURACY
  7%|█████▋                                                                           | 64/916 [03:00<40:03,  2.82s/it]

There are few annotations, i do not think 2k annotations could be a problem.
It is running from 30 minutes at least.

What can i check? Thank you

I also have tried adding --label EMAIL (my new entity type). But same problem…

What version are you running? We fixed a significant performance problem on this in the latest update.


C:\Users\Damiano>python -m prodigy stats

  ?  Prodigy stats

  Version            1.4.0
  Location           C:\Program Files\Python3.6.4\lib\site-packages\prodigy
  Prodigy Home       C:\Users\Damiano\.prodigy
  Platform           Windows-10-10.0.16299-SP0
  Python Version     3.6.4
  Database Name      SQLite
  Database Id        sqlite
  Total Datasets     8
  Total Sessions     23

I have bought prodigy few days ago (23 March)

We just released 1.4.1 last night — try that?

I try 1.4.1 and i will update this thread. Thanks.

@honnibal unfortunately, no good news.

It seems faster but hangs after the first iteration. This is the output:

C:\Users\Damiano\lavoro\python>python -m prodigy ner.batch-train email it_core_news_sm --output C:\Users\Damiano\Model --n-iter 10 --eval-split 0.2 --dropout 0.2 --label EMAIL
Using 1 labels: EMAIL

Loaded model it_core_news_sm
Using 20% of accept/reject examples (226) for evaluation
Using 100% of remaining examples (916) for training
Dropout: 0.2  Batch size: 32  Iterations: 10

BEFORE     0.000
Correct    0
Incorrect  34
Entities   4257
Unknown    1029

#          LOSS       RIGHT      WRONG      ENTS       SKIP       ACCURACY
01         95.834     1          33         3021       0          0.029
 66%|█████████████████████████████████████████████████████                           | 608/916 [03:05<01:34,  3.27it/s]

I also have tried with verbose in PRODIGY_LOGGING but i do not see interesting information to give to you,

is batch-train so unstable in Windows?

same with ner.train-curve

Hmm! On Windows I could imagine a few different problems. I should’ve noticed your path in your earlier snippet.

Did you install via conda or pip? Could you run this benchmark and tell me what it says?

from timeit import default_timer as timer
import numpy

def main():
    m = 20000
    n = 10000
    k = 5000
    A = numpy.zeros((m,k), dtype='f')
    A += numpy.random.uniform(size=A.size).reshape(A.shape)
    B = numpy.zeros((k,n), dtype='f')
    B += numpy.random.uniform(size=B.size).reshape(B.shape)
    start = timer()
    C =, B)
    end = timer()
    print(end-start, C.sum())

if __name__ == '__main__':

On my machine, the timing of this benchmark varies greatly depending on how numpy is configured. If I install numpy via conda, the matrix multiplication is performed via Intel’s mkl library, and the calculation finishes in 12.2s. If I install numpy via pip with default settings, it’s linked against a version of OpenBLAS that does not detect my CPU correctly. That version takes 55.6s to complete. On Windows, I’m not sure what numpy does by default. It’s possible it’s fallen back to a reference BLAS library with no assembly kernels, which can be 10-20x slower than it should be.

The next versions of spaCy and Thinc address this by shipping OpenBLAS’s matrix multiplication function within Thinc. But in the meantime, if you’ve installed numpy via pip, try using conda to install numpy. You can still install the Prodigy wheel via pip — but just install numpy into the environment first.

Hi @honnibal
maybe i have resolved my problem leaving this stupid OS… I just have installed Ubuntu 16.04. :smiley:
I will install numpy via conda and maybe run Win 10 pro in a virtual machine.

However, i have run your script in ubuntu and the result is: 7.77815374399961 250011010000.0
numpy via pip seems fast enough no?

@honnibal you will not believe it, but your script is running faster on windows.

The result is: 5.084604135241529 249949060000.0

I have reinstalled it on the same machine to test it for you. So, now, we know numpy is not the problem.

Could you provide some more details about your data? How long are your examples? Do you have the same problem if you use spacy train?

I wonder whether something is wrong with the Italian model…

@honnibal i am trying to use prodigy but there are many “out of memory” error using ner.batch-train ner.train-curve
Before i was using big documents around (600-800 tokens long) but now the text are smaller. I have created a custom recipe that show 100 character before/after the entity to confirm. So max 250 characters now.

Matt another problem is that we cannot split a dataset. If it is a problem with memory we must split annotations but it does not seem possible right now. correct?


600-800 tokens isn’t that long. When I wrote the beam search I did assume one sentence per text, so I was thinking 50-60 words maximum. If possible it would be better to let Prodigy use the sentence boundary detection than to impose the character-based cutoff, which will cut words part through.

You can do prodigy db-out on your dataset to get a .jsonl file, and then split that up and feed it back in with prodigy db-in. I hope that’s not necessary though – I don’t think the size of the dataset is the problem.

@honnibal i do not know, but as i told you i get out of memory sometimes.

I have 32 gb of ram and 12 gb of swap

I give you an example, at the moment i am using ner.batch-train with this command:

python3 -m prodigy ner.batch-train citizenship /home/damiano/ner --output /home/damiano/ner_test --n-iter 100

At the 50th iteration the python process is using 22.5 gb!

The annotations are just 2012, and the texts are around 220-250 characters long.

damiano@damiano:~$ python3 -m prodigy stats citizenship

  ✨  Prodigy stats

  Prodigy Home       /home/damiano/.prodigy 
  Database Id        sqlite             
  Location           /usr/local/lib/python3.5/dist-packages/prodigy 
  Database Name      SQLite             
  Total Sessions     18                 
  Platform           Linux-4.13.0-37-generic-x86_64-with-Ubuntu-16.04-xenial 
  Total Datasets     2                  
  Version            1.4.1              
  Python Version     3.5.2               

  ✨  Dataset 'citizenship'

  Created            2018-04-03 16:15:02 
  Dataset            citizenship        
  Reject             1430               
  Ignore             0                  
  Author             None               
  Description        None               
  Accept             582                
  Annotations        2012  

I had to restart my pc because after 60th iteration has become unstable, completely blocked.

I do not know what happen internally but after each iteration it should update the weights only no? why so much memory is needed?

Hmm. I regularly train on a 32gb machine. Maybe try:

python3 -m prodigy ner.batch-train citizenship /home/damiano/ner --output /home/damiano/ner_test --n-iter 100 --batch-size 4 --beam-width 4

This sets the batch size and beam width smaller. Also try:

token_vector_width=64 hidden_width=64 python3 -m prodigy ner.batch-train citizenship /home/damiano/ner --output /home/damiano/ner_test --n-iter 100 --batch-size 4 --beam-width 4

This sets the dimensions much smaller within the neural network, which should make things faster. Hopefully these settings make it less painful to run the model while we figure out whether there’s a memory leak.

@honnibal does it impact the accuracy?

Reducing the model size is likely to reduce accuracy, yes. As for batch size and beam width, that’s less certain.

it seems more accurate :slight_smile: (i have tried the first command) i did not touch token_vector_width=64 hidden_width=64

However, if i used beam = 4 for this custom entity can i not change it for other entities?