Hi @nanyasrivastav!
Thank you for your questions! These concepts are very important. I'm glad you've asked because it's hard to effectively use Prodigy without understanding them .
Initial learn rate: The initial learning rate used. The learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. This is a default value but can be optimized for training. Optimizing the learning rate is an advanced approach as modifying the learning rate can have trade-offs between the rate of convergence and overshooting. See Wikipedia's article on Learning Rate
E: This is the number of completed epochs. Epochs can be thought of the number of passes on the training data. Training data is used multiple times in the algorithm (e.g., gradient descent) because it typically doesn't reach a (global or local) minimum on its first epoch. Stack Exchange comment for more details.
#: Number of iterations, or documents, that were passed through in training. This number recounts documents even after re-use from an additional epoch. For example, if you have 114 training documents, your first epoch will be completed after iteration 114, your second epoch will be completed after iteration 228, etc.
LOSS: The loss is the value of the loss function for the associated pipeline component. Spacy models can be made up of multiple components in a pipeline. As you see in your pipeline, you have two components:
tok2vec
(which is the value for each token) andner
(which is the named entity recognition). By default in spaCy 3.0, pipelines will share the sametok2vec
component. This can be altered to unique independent components that are more modular but lead to larger and slower-to-train pipelines.
Here are a few general resources I would recommend reading:
- spaCy 101, especially sections pipelines, architecture and training
- spaCy Training Pipelines & Models
There are also a few helpful Prodigy Support and spaCy community discussions that may help:
- Specific formula for F score, precision and recall NER
- How to measure NER model's loss in a validation set? #6574
- Custom NER training - understanding loss and learn rate schedule #10682
- Understanding training output for textcat_multilabel - steps vs epochs #10343: This explains the difference between epochs and iterations. But it also makes a connection with
batch_size
, which is another hyperparameter - How to set Overrides in Prodigy. This will provide you how to customize training procedures (e.g.,
learn_rate
,batch_size
, etc.). You can pass this as an argument (easiest) or modify youconfig.cfg
file (more advanced).
Thanks again for your question and let us know if you have any further questions!