CPU Multiprocessing support for Recipes

clrogers · February 13, 2018, 5:47am

I’m currently working on a classification project with approximately 850 classes and 240k training samples. Currently textcat.batch-train takes over 7 hours per training iteration and doesn’t appear to take advantage of the 24 cores available on my machine. Are recipes multiprocessing capable? If so how do I enable this feature?

honnibal · February 13, 2018, 7:42am

The most expensive part of training should be in the matrix multiplications, which numpy should be delegating to a BLAS library. It’s possible something hasn’t been compiled with proper optimisations for your machine, but my guess is this is more likely to be an issue with the hyper-parameters.

Some questions to help figure this out:

If you’re using Linux, are you able to watch the process in perf top? If so, what function does it seem to spend most time within?
How long are your texts?
Have you tried increasing the batch size? The default of 10 is tuned for low-data use-cases, as our default assumption is that the data was manually annotated. If you increase the batch size to 32, do you get better CPU utilisation?

Finally, 850 labels is actually more than I’ve trained spaCy’s textcat with. It’s possible I’m making an expensive loop somewhere, so there might be a simple optimisation I can make within spaCy. I’ll have a look.

clrogers · February 13, 2018, 4:04pm

Matthew,

Thanks for a quick reply.

The most expensive part of training should be in the matrix multiplications, which numpy should be delegating to a BLAS library. It’s possible something hasn’t been compiled with proper optimisations for your machine,

I'm using the Intel MKL library and scipy.show_config() indicates that it has been built using MKL. Additionally gensim.models.word2vec.FAST_VERSION == 1.

Here is a snapshot of what I'm running. Data was loaded using db-in. Classes are imbalanced with a minimum of 6 positive examples and a maximum of 280. Classes are mutually exclusive and an equal number of negative examples were randomly generated for each class based on the number of positive examples.

If you’re using Linux, are you able to watch the process in perf top? If so, what function does it seem to spend most time within?

I'm running this on Windows 10 and the Windows Resource Monitor / Task Manager only shows a single python process.

How long are your texts?
I'm trying to classify product descriptions.
Word Counts
Min: 53
Mean: 692
Max: 2004

Have you tried increasing the batch size? The default of 10 is tuned for low-data use-cases, as our default assumption is that the data was manually annotated. If you increase the batch size to 32, do you get better CPU utilisation?

I've tried a batch size of 10, 32, 64, 128, 256. While total time goes down with an increased batch size, I don't think CPU utilization is better. Bigger batch size does result in slightly higher it/s, but nothing drastic.

Topic		Replies	Views
Slow training on multilabel textcats usage , textcat , spacy	9	842	November 19, 2021
Spacy TextCat: Training time increased on minor increase in training instances. textcat , spacy	1	432	February 1, 2020
Prodigy ner.batch-train no longer multi threaded? usage , spacy	4	1054	June 7, 2019
0 gpu usage on textcat-multilabel	1	85	May 20, 2024
How can i use all the cores of CPU for training model of spancat usage , spacy , training , spancat	3	485	June 24, 2022

CPU Multiprocessing support for Recipes

Related topics