Will a GPU make training faster?

Optimizing parsing and NER for GPU is actually quite difficult, since decisions need to be made on every token. The GPU training in spaCy is currently 1-5x faster depending on the batch size, document lengths and model hyper-parameters.

The problem is that some of Prodigy’s training functions use beam-search training, and I haven’t tested these functions for GPU in spaCy. The installation workflow for GPU usage is also a little bit rough.

A brief background here: I think a lot of folks have a slightly misleading perception of the relative speed of CPU and GPU for deep learning for NLP. In computer vision, common CNN architectures are vastly more efficient on GPU. This doesn’t really apply in NLP. We still use CNNs, but the shape of our operations is very very different. It’s actually not so easy to beat good CPU code. The part that’s tricky is that CPU training is very much an unloved child for most deep learning frameworks. For instance, early versions of Tensorflow were usually installed without linkage to a decent BLAS library. This made the CPU usage about 20x slower than it should have been.

All that said, here’s what you need to do to train Prodigy models with GPU.

  1. Make sure Thinc is installed with GPU linkage, as described here: https://spacy.io/usage/#gpu . You should be able to do import cupy, and you should also be able to do import thinc.neural.gpu_ops.

  2. Try using the spacy train command with the -g 0 argument, and check that your GPU is actually being used. I use the nvidia-smi command for this.

  3. Modify your Prodigy recipe so that the GPU is used. For instance, if you want to use the GPU in the ner.batch-train recipe, pass use_device=0 to the nlp.begin_training() function.

@SandeepNaidu : Could you provide some details of what recipe you’re using, and how much data?

Some training tasks will in fact take hours; this is something I’m very interested in improving within spaCy, and more broadly the deep learning community is actively working on. But it could be that there are some low hanging fruit within Prodigy that we can address — we might have some redundant operations, for instance.

1 Like