Prodigy ner.batch-train no longer multi threaded?

Since updating prodigy to the latest version and using spacy 2.1, ner.batch-train is no longer using multiple cores on the SageMaker Notebook Instance I have set up for hyper parameter tuning. This is an annoying problem as models are taking 7-8x longer to train, slowing down my experimentation.

Is there some flag for number of cores/threads that I am missing or something I should be doing different when installing? My current data set size doesn’t really warrant using gpu acceleration so I have been using the cpu installation of spacy.

This may be related to: https://github.com/explosion/spaCy/issues/3820; however, updating numpy and installing spacy using conda instead of pip does not change the behavior.

spaCy v2.1 should be quite a lot more efficient than v2.0, which didn’t use multiple cores very efficiently. It’s generally inefficient to launch threads for the matrix multiplications, and it often resulted in slower processing times than single-threaded execution.

I’m surprised that you’re finding the training slower, as I would expect it to be faster. Have you checked to make sure the batch sizes are the same? Also, what sort of CPU does the SageMaker Notebook server have? Is it a normal AWS VM, or something more exotic?

My Sagemaker instance is of the type ml.m5.4xlarge. With prodigy 1.8.1 ner-batch-train uses 99 to 100 %CPU, with 1.7.1. use is around 1400 %CPU.

@jprusa Yes but is it actually faster with v1.7.1? This section of the v2.1 announcement explains the context around this: https://explosion.ai/blog/spacy-v2-1#matrix-multiplication

Briefly, using more machine resources is not a goal in itself. If we can use 4x more resources to train 3x faster, that’s often worth it. But using 10x as many resources to train 1.5x faster isn’t so appealing. In fact what often would happen in v2.0 is that numpy would launch far too many threads, and on a large instance you’d be using 14x as many resources to actually train 0.8x as quickly. That’s obviously a bad deal.

If you’re achieving 7x faster training with 14x the machine usage, that looks like a much better deal, so I’d like to understand why v2.1 is so much slower. Could you tell me the average number of words per text in your data, and the command you’re using to trigger the training?

I reinstalled 1.7.1 and found it to be slower as you expected. Upon more experimentation, I am getting a lot of variance in training times on the instance and some extremely slow file i/o (several minutes to load a model) and then other times it only takes seconds. It looks like I’m having AWS issues rather than a problem with spacy or prodigy.