understanding the different terminology in the command line output of a training pipeline

nanyasrivastav · June 20, 2022, 4:53pm

Hi, just a quick question regarding what the different columns of a training pipeline mean. I'm new to NLP and have been searching for resources to understand basic concepts. For instance, I was wondering what the following terms mean:

initial learn rate
what 'E' and '#' stand for in the output
what loss tok2vec and loss ner signify

I understand it would be difficult to explain these here, I was hoping to be directed to good study material/ blogs or documentation just explaining what these different terms mean. This would help me analyze my outputs better or identify any issues with my training data.

ryanwesslen · June 20, 2022, 6:17pm

Hi @nanyasrivastav!

Thank you for your questions! These concepts are very important. I'm glad you've asked because it's hard to effectively use Prodigy without understanding them .

Initial learn rate: The initial learning rate used. The learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. This is a default value but can be optimized for training. Optimizing the learning rate is an advanced approach as modifying the learning rate can have trade-offs between the rate of convergence and overshooting. See Wikipedia's article on Learning Rate

E: This is the number of completed epochs. Epochs can be thought of the number of passes on the training data. Training data is used multiple times in the algorithm (e.g., gradient descent) because it typically doesn't reach a (global or local) minimum on its first epoch. Stack Exchange comment for more details.

#: Number of iterations, or documents, that were passed through in training. This number recounts documents even after re-use from an additional epoch. For example, if you have 114 training documents, your first epoch will be completed after iteration 114, your second epoch will be completed after iteration 228, etc.

LOSS: The loss is the value of the loss function for the associated pipeline component. Spacy models can be made up of multiple components in a pipeline. As you see in your pipeline, you have two components: tok2vec (which is the value for each token) and ner (which is the named entity recognition). By default in spaCy 3.0, pipelines will share the same tok2vec component. This can be altered to unique independent components that are more modular but lead to larger and slower-to-train pipelines.

Here are a few general resources I would recommend reading:

There are also a few helpful Prodigy Support and spaCy community discussions that may help:

Specific formula for F score, precision and recall NER
How to measure NER model's loss in a validation set? #6574
Custom NER training - understanding loss and learn rate schedule #10682
Understanding training output for textcat_multilabel - steps vs epochs #10343: This explains the difference between epochs and iterations. But it also makes a connection with batch_size, which is another hyperparameter
How to set Overrides in Prodigy. This will provide you how to customize training procedures (e.g., learn_rate, batch_size, etc.). You can pass this as an argument (easiest) or modify you config.cfg file (more advanced).

Thanks again for your question and let us know if you have any further questions!

nanyasrivastav · June 20, 2022, 9:37pm

Thank you @ryanwesslen for taking the time to put all of this together!

Topic		Replies	Views
Training table doubts ner , training	1	644	November 17, 2022
What does the outputs mean from "train?" usage , ner , spacy , solved	1	784	February 9, 2022
Prodigy Training Piepline usage , ner , solved , training	1	345	January 19, 2022
Remarkable Difference Between Prodigy and Custom Training Times ner	5	1440	April 4, 2018
Evaluation data for ner model ner	2	379	October 11, 2023

understanding the different terminology in the command line output of a training pipeline

Related topics