understanding the different terminology in the command line output of a training pipeline

Hi, just a quick question regarding what the different columns of a training pipeline mean. I'm new to NLP and have been searching for resources to understand basic concepts. For instance, I was wondering what the following terms mean:

  1. initial learn rate
  2. what 'E' and '#' stand for in the output
  3. what loss tok2vec and loss ner signify

I understand it would be difficult to explain these here, I was hoping to be directed to good study material/ blogs or documentation just explaining what these different terms mean. This would help me analyze my outputs better or identify any issues with my training data.

Hi @nanyasrivastav!

Thank you for your questions! These concepts are very important. I'm glad you've asked because it's hard to effectively use Prodigy without understanding them :slight_smile:.

Initial learn rate: The initial learning rate used. The learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. This is a default value but can be optimized for training. Optimizing the learning rate is an advanced approach as modifying the learning rate can have trade-offs between the rate of convergence and overshooting. See Wikipedia's article on Learning Rate

E: This is the number of completed epochs. Epochs can be thought of the number of passes on the training data. Training data is used multiple times in the algorithm (e.g., gradient descent) because it typically doesn't reach a (global or local) minimum on its first epoch. Stack Exchange comment for more details.

#: Number of iterations, or documents, that were passed through in training. This number recounts documents even after re-use from an additional epoch. For example, if you have 114 training documents, your first epoch will be completed after iteration 114, your second epoch will be completed after iteration 228, etc.

LOSS: The loss is the value of the loss function for the associated pipeline component. Spacy models can be made up of multiple components in a pipeline. As you see in your pipeline, you have two components: tok2vec (which is the value for each token) and ner (which is the named entity recognition). By default in spaCy 3.0, pipelines will share the same tok2vec component. This can be altered to unique independent components that are more modular but lead to larger and slower-to-train pipelines.

Here are a few general resources I would recommend reading:

There are also a few helpful Prodigy Support and spaCy community discussions that may help:

Thanks again for your question and let us know if you have any further questions!

Thank you @ryanwesslen for taking the time to put all of this together!