Will a GPU make training faster?

wpm · January 4, 2018, 7:50pm

Since these are CNNs it seems like training might go faster on a GPU. I see that spacy train has a --use-gpu option but don’t see any corresponding options for prodigy.

SandeepNaidu · February 1, 2018, 6:46am

Can we please get an answer for this question? My model training sometimes takes hours. Enabling GPU option will be good.

honnibal · February 1, 2018, 8:07pm

Optimizing parsing and NER for GPU is actually quite difficult, since decisions need to be made on every token. The GPU training in spaCy is currently 1-5x faster depending on the batch size, document lengths and model hyper-parameters.

The problem is that some of Prodigy’s training functions use beam-search training, and I haven’t tested these functions for GPU in spaCy. The installation workflow for GPU usage is also a little bit rough.

A brief background here: I think a lot of folks have a slightly misleading perception of the relative speed of CPU and GPU for deep learning for NLP. In computer vision, common CNN architectures are vastly more efficient on GPU. This doesn’t really apply in NLP. We still use CNNs, but the shape of our operations is very very different. It’s actually not so easy to beat good CPU code. The part that’s tricky is that CPU training is very much an unloved child for most deep learning frameworks. For instance, early versions of Tensorflow were usually installed without linkage to a decent BLAS library. This made the CPU usage about 20x slower than it should have been.

All that said, here’s what you need to do to train Prodigy models with GPU.

Make sure Thinc is installed with GPU linkage, as described here: https://spacy.io/usage/#gpu . You should be able to do import cupy, and you should also be able to do import thinc.neural.gpu_ops.
Try using the spacy train command with the -g 0 argument, and check that your GPU is actually being used. I use the nvidia-smi command for this.
Modify your Prodigy recipe so that the GPU is used. For instance, if you want to use the GPU in the ner.batch-train recipe, pass use_device=0 to the nlp.begin_training() function.

@SandeepNaidu : Could you provide some details of what recipe you’re using, and how much data?

Some training tasks will in fact take hours; this is something I’m very interested in improving within spaCy, and more broadly the deep learning community is actively working on. But it could be that there are some low hanging fruit within Prodigy that we can address — we might have some redundant operations, for instance.

SandeepNaidu · February 2, 2018, 1:49am

Thanks @honnibal . I am trying ner.batch-train recipe. I have about 3000 annotations. I was curious about GPU as I used Tensorflow before for NLP and found it very fast on GPU. Yes initial TF was not linked to GPU but later on they added the support. These particular annotations have longer sentences and I think that is the reason for slowness. The documents are transformed versions of structured text.

How can I use “spacy train” instead of “prodigy batch-train” with prodigy’s annotations? I think I missed some reading regarding this.

ines · February 2, 2018, 1:54am

Here's a thread on this topic, including example scripts. Version 1.3.0 of Prodigy, which we've just released includes a new ner.gold-to-spacy converter recipe that exports training data in spaCy's formats (either the "simple training style" or BILUO scheme annotations). You can also check out the source of that recipe for inspiration, and write your own conversion script.

SandeepNaidu · February 2, 2018, 4:46am

Thanks @ines. Awesome support from both of you. I will read further.

honnibal · February 2, 2018, 5:14pm

@SandeepNaidu With 3000 annotations it probably shouldn’t take hours to train, regardless of CPU or GPU. I wonder whether there’s an efficiency problem we could fix easily. I’ll look into this and get back to you.

mitch · July 20, 2018, 9:32am

@honnibal did you find any efficiency problems? I am training an NER model with ~13000 annotations and 5 labels, it’s taking around 2 hours to train. I am CPU training on an n1-standard-4 GCP instance (~80% CPU utilization). It could be that 2 hours is just how long it takes, seems sort of reasonable given the dataset size? But just wondered if there’s anything I can do to speed up the training?

Topic		Replies	Views
How to use GPU to accelerate the train of NER tasks? training	5	2438	August 25, 2021
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1209	July 2, 2020
Running on CPU usage , solved , training	4	1477	February 3, 2022
Prodigy ner.batch-train vs Spacy train usage , spacy , best-practices	13	3498	June 2, 2020
Suggestions for running on Google Compute Engine install , solved , google-cloud	15	2216	May 14, 2019

Will a GPU make training faster?

Related topics