slow training NER task on base model en_core_web_md

XBeg9 · January 7, 2022, 10:09pm

Spent enormous time inside documentation trying to find how to speed up the training process for my labeled dataset

Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 63291 | Evaluation: 15822 (20% split)
Training: 63290 | Evaluation: 15822

training one epoch takes approx ~2hrs, and what's very strange it doesn't use even 15% of accessible CPU

prodigy train --ner nel_skills_large1 model --base-model en_core_web_md

Would love any suggestions on parameters (config.cfg), that can speed up the process (tried "batch_size", no luck)

Thanks

ines · January 9, 2022, 1:17pm

2 hours per epoch definitely sounds pretty logng. Are you sure you're not running out of memory or disk space? You could run some profiling to take a look at what's particularly slow and whether memory is an issue.

Alternatively, you could also try streaming in your corpus by setting max_epochs = -1 if it's too large to fit into memory. See the second part of this section here for details: https://spacy.io/usage/training#custom-code-readers-batchers This would be slightly more involved, though, since you need to do your own shuffling and make sure all the labels are initialised, since the corpus isn't available in memory and spaCy can't just process it to read all available labels from it.

Topic		Replies	Views
Prodigy model not learning, spaCy model ~90% F1 score usage , ner , spacy	11	1827	May 21, 2019
Using the output of ner.gold-to-spacy to train a new model ner , spacy	3	1053	April 4, 2018
Iterating on a NER spaCy model with Prodigy usage , ner , spacy , solved	3	403	July 21, 2020
ner.teach very slow ner	7	1349	June 27, 2018
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1206	July 2, 2020

slow training NER task on base model en_core_web_md

Related topics