How to under training.max_steps in train

Hello, I am training a ner model using this command. I found the score varies when I change --training.max_steps in my command, e.g., the score of training.max_steps=1000 is 0.29, training.max_steps=2000 is 0.58, and the score goes up to 0.9 when the --training.max_steps =0. I do not quite understnd why. I was wondering if you could help me understand the setting of training.max_steps and its impact on the model performance metrics? Thank you!

Hi there!

Could you share the train commands as well as the output of the train call? It sounds somewhat strange that --training=max.steps=0 would ever lead to a higher score unless you're running the train command with a pretrained model that's already learned from the data.

of course. Thank you for your thoughts!

python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats

python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats

p2

When I look at your output it seems that the scores and losses can fluctuate, but that they all eventually decrease. That's normal. This can be due to gradient descent that's overstepping, and there are other numeric phenomenon that can cause this effect as well.

This is also why the training procedure stores two models: model_best and model_latest. The most recent iteration of the model might not be the best performing one. This way, you won't need to worry too much about the max-steps.

Does this help or is there something still unclear? If it is, could you point to a specific metric in the loss table that's confusing and explain why?

Thank you for your explaining on that. It is helpful!
I guest that my confusion is when I was looking my detailed stats. As we are annotated our data with different labels, the score of different labels differ a a lot when the max steps are set differently. I was wondering if you can help me understand on that? Thank you!
max steps = 0
p1
max step =2000
p2

Just to check, when you run the stats for a model with --max-steps=0, are you sure it's not picking up a pre-trained model? Also, assuming there is a pre-trained model, could you elaborate on what you mean with "we annotated our data with different labels"? If you're training models on two different sets of data it also helps explain the difference.

Could you also share the commands that you use to generate these tables? Also, please feel free to post the stats as inline code instead of screenshots. It's much easier to search/copy/paste in text.

We used ner.manual to annotate our data. The command is

python -m prodigy ner.manual citation_2nd_1 blank:en .\citation_2nd\To-annotate\beyond_profit.txt --label AGENCY_R,CASES,FOREIGN,CONSTITUTIONS, ...

By doing the ner.manual, we collected about 1400 annotations with different labels. After that, I trained th initial model, using below commands. The only difference is the max steps setting.

python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats
python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats

I guess what confused me is the score of the same label, such as F score of Journal is 92.96 when the max steps =0, and its F score is 73.68 when the max step =2000. Thank you for your thoughts!

Could you try running these commands:

python -m prodigy train model-with-2000-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats
python -m prodigy train model-with-1-step --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=1 --label-stats
python -m prodigy train model-with-0-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats

I worry that the results are skewed because you're pointing to the same folder. These runs each save their models into their own folders.

Hello! I followed the commands you suggested and noticed that the results are quite similar to my previous trials. Can you please share your thoughts on this?

python -m prodigy train model-with-2000-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats
============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 1150 | Evaluation: 278 (from datasets)
Training: 1145 | Evaluation: 278
Labels: ner (16)
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00    128.55    0.00    0.00    0.00    0.00
  0     200       2719.65   3329.99   14.97   24.56   10.77    0.15
  0     400      13771.61   5036.63   15.46   23.44   11.54    0.15
  1     600      14911.51   4987.48   14.43   21.88   10.77    0.14
  2     800      24490.44   4489.47   36.51   39.64   33.85    0.37
  3    1000      23853.08   4820.55   29.29   32.11   26.92    0.29
  3    1200      19903.98   4142.67   53.22   60.19   47.69    0.53
  5    1400      40800.60   5401.68   42.98   46.43   40.00    0.43
  6    1600      41073.55   5112.98   50.00   51.64   48.46    0.50
  8    1800      53870.93   6966.36   44.08   46.96   41.54    0.44
 10    2000      66122.85   8084.27   58.27   59.68   56.92    0.58
✔ Saved pipeline to output directory
model-with-2000-steps\model-last

=============================== NER (per type) ===============================

                    P       R       F
JOURNAL         65.12   84.85   73.68
WORKING          0.00    0.00    0.00
MEDIA           20.00   16.67   18.18
BOOK            66.67   50.00   57.14
CASE_S          58.62   80.95   68.00
WEBSITES        55.56   27.78   37.04
STATUTES        83.33   83.33   83.33
COURT           50.00   33.33   40.00
DATASETS         0.00    0.00    0.00
AGENCY_D         0.00    0.00    0.00
CONSTITUTIONS    0.00    0.00    0.00
python -m prodigy train model-with-0-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00    128.55    0.00    0.00    0.00    0.00
  0     200       2719.65   3329.99   14.97   24.56   10.77    0.15
  0     400      13771.61   5036.63   15.46   23.44   11.54    0.15
  1     600      14911.51   4987.48   14.43   21.88   10.77    0.14
  2     800      24490.44   4489.47   36.51   39.64   33.85    0.37
  3    1000      23853.08   4820.55   29.29   32.11   26.92    0.29
  3    1200      19903.98   4142.67   53.22   60.19   47.69    0.53
  5    1400      40800.60   5401.68   42.98   46.43   40.00    0.43
  6    1600      41073.55   5112.98   50.00   51.64   48.46    0.50
  8    1800      53870.93   6966.36   44.08   46.96   41.54    0.44
 10    2000      66122.85   8084.27   58.27   59.68   56.92    0.58
 12    2200      62738.91   6870.88   62.02   62.50   61.54    0.62
 15    2400      72402.73   6430.36   65.61   67.48   63.85    0.66
 19    2600      75198.70   6506.38   68.68   67.41   70.00    0.69
 22    2800     105203.06   9315.15   71.04   71.32   70.77    0.71
 26    3000      70133.46   5546.94   70.99   70.45   71.54    0.71
 29    3200      65737.74   5291.23   77.22   77.52   76.92    0.77
 33    3400      50814.73   3965.48   80.30   79.10   81.54    0.80
 36    3600      51978.00   3492.22   83.33   82.09   84.62    0.83
 40    3800      44422.39   3439.81   85.71   86.05   85.38    0.86
 43    4000      38814.58   2563.37   86.36   85.07   87.69    0.86
 47    4200      69109.05   2896.03   83.40   83.72   83.08    0.83
 51    4400      47747.09   2280.28   84.17   84.50   83.85    0.84
 54    4600      24583.09   1518.34   87.79   87.12   88.46    0.88
 58    4800      23161.80   1472.89   86.49   86.82   86.15    0.86
 61    5000      36885.37   1092.25   89.39   88.06   90.77    0.89
 65    5200      20819.10    757.23   90.15   88.81   91.54    0.90
 68    5400      18255.33    641.18   87.88   86.57   89.23    0.88
 72    5600      17563.40    499.21   89.15   89.84   88.46    0.89
 75    5800       5718.46    393.41   90.49   89.47   91.54    0.90
 79    6000      17640.38    315.97   87.79   87.12   88.46    0.88
 82    6200      25171.84    532.83   89.58   89.92   89.23    0.90
 86    6400      18583.70    426.25   88.89   88.55   89.23    0.89
 89    6600      27922.14    630.66   89.23   89.23   89.23    0.89
 93    6800      24200.22    401.64   89.81   88.15   91.54    0.90
 96    7000      18341.90    345.09   87.88   86.57   89.23    0.88
100    7200      18282.35    246.37   88.30   86.67   90.00    0.88
103    7400      67315.05    359.63   88.80   89.15   88.46    0.89
✔ Saved pipeline to output directory
model-with-0-steps\model-last

=============================== NER (per type) ===============================

                     P        R        F
JOURNAL          86.84   100.00    92.96
WORKING         100.00    50.00    66.67
MEDIA            81.82    75.00    78.26
WEBSITES         94.12    88.89    91.43
BOOK             94.74    90.00    92.31
STATUTES         85.71   100.00    92.31
COURT            71.43    83.33    76.92
DATASETS         50.00   100.00    66.67
CASE_S          100.00    95.24    97.56
AGENCY_D        100.00   100.00   100.00
CONSTITUTIONS   100.00   100.00   100.00

Ah! Ok! Now I see. I misread one of your earlier comments, sorry about that! I now understand that it's running the full loop when the training steps is equal to zero. I initially had the impression this was about fluctuations in the score. Again, my bad!

I just ran everything myself locally and it seems like we've caught a small bug. I was able to reproduce this on a simple NER example in Prodigy but also on a textcat use-case that just relies on spaCy. Because the behavior also occurs there it feels safe to say this issue isn't caused by Prodigy.

Here's an example run with 0 steps.

python -m spacy train data/dataset/config.cfg --output training/dataset/ --paths.train data/dataset/train.spacy --paths.dev data/dataset/dev.spacy --training.max-steps=0

Notice the many many steps that it runs:

============================= Training pipeline =============================
ℹ Pipeline: ['textcat_multilabel']
ℹ Initial learn rate: 0.001
E    #       LOSS TEXTC...  CATS_SCORE  SCORE 
---  ------  -------------  ----------  ------
  0       0           0.25       57.55    0.58
  0     200          51.73       88.56    0.89
  1     400           7.81       91.20    0.91
  2     600           1.40       92.23    0.92
  3     800           0.60       91.50    0.91
  4    1000           0.27       91.64    0.92
  5    1200           0.22       91.50    0.91
  6    1400           0.14       91.20    0.91
  7    1600           0.14       91.35    0.91
  8    1800           0.10       91.64    0.92
  9    2000           0.09       91.20    0.91
 10    2200           0.08       91.50    0.91
✔ Saved pipeline to output directory

But here's the same one with only one step.

> python -m spacy train data/dataset/config.cfg --output training/dataset/ --paths.train data/dataset/train.spacy --paths.dev data/dataset/dev.spacy --training.max-steps=1

And now, we see way less.

============================= Training pipeline =============================
ℹ Pipeline: ['textcat_multilabel']
ℹ Initial learn rate: 0.001
E    #       LOSS TEXTC...  CATS_SCORE  SCORE 
---  ------  -------------  ----------  ------
  0       0           0.25       57.55    0.58
✔ Saved pipeline to output directory

It seems like if the maximum number of steps is equal to zero that it will assume a default instead.

Again, sorry about the unnecessary back and forth. It seems like --training.max-steps=0 is causing an inuintuitive default behavior. Just to check though, when you run --training.max-steps=1 then you do see the same behavior as me?

Thank you so much for looking into this! I rerun the commands with max-steps=0, max-steps=1, max-steps=2000. For max-steps=0,max-steps=2000, the results remains the same. And for max-steps=1, my results are
(ENV) PS C:\Users\rapiduser\Workspace\zhang-annotations> python -m prodigy train model-with-1-step --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=1 --label-stats
:information_source: Using CPU

========================= Generating Prodigy config =========================
:information_source: Auto-generating config with spaCy
:heavy_check_mark: Generated training config

=========================== Initializing pipeline ===========================
[2023-03-13 23:32:28,829] [INFO] Set up nlp object from config
Components: ner
Merging training and evaluation data for 1 components

  • [ner] Training: 1150 | Evaluation: 278 (from datasets)
    Training: 1145 | Evaluation: 278
    Labels: ner (16)
    [2023-03-13 23:32:29,126] [INFO] Pipeline: ['tok2vec', 'ner']
    [2023-03-13 23:32:29,142] [INFO] Created vocabulary
    [2023-03-13 23:32:29,142] [INFO] Finished initializing nlp object
    [2023-03-13 23:32:31,798] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
    :heavy_check_mark: Initialized pipeline

============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components

  • [ner] Training: 1150 | Evaluation: 278 (from datasets)
    Training: 1145 | Evaluation: 278
    Labels: ner (16)
    :information_source: Pipeline: ['tok2vec', 'ner']
    :information_source: Initial learn rate: 0.001
    E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE

0 0 0.00 128.55 0.00 0.00 0.00 0.00
:heavy_check_mark: Saved pipeline to output directory
model-with-1-step\model-last

=============================== NER (per type) ===============================

               P      R      F

JOURNAL 0.00 0.00 0.00
WORKING 0.00 0.00 0.00
MEDIA 0.00 0.00 0.00
WEBSITES 0.00 0.00 0.00
BOOK 0.00 0.00 0.00
STATUTES 0.00 0.00 0.00
COURT 0.00 0.00 0.00
DATASETS 0.00 0.00 0.00
CASE_S 0.00 0.00 0.00
AGENCY_D 0.00 0.00 0.00
CONSTITUTIONS 0.00 0.00 0.00

1 Like

I just checked in with a spaCy colleague who pointed me to the spaCy API docs, where this behavior is documented. To quote what is mentioned there for the max-steps argument:

Maximum number of update steps to train for. 0 means an unlimited number of steps. Defaults to 20000.

So that explains the behavior!

That makes sense. Thank you!

1 Like