How to under training.max_steps in train

jiebei · March 4, 2023, 2:41am

Hello, I am training a ner model using this command. I found the score varies when I change --training.max_steps in my command, e.g., the score of training.max_steps=1000 is 0.29, training.max_steps=2000 is 0.58, and the score goes up to 0.9 when the --training.max_steps =0. I do not quite understnd why. I was wondering if you could help me understand the setting of training.max_steps and its impact on the model performance metrics? Thank you!

koaning · March 4, 2023, 10:34am

Hi there!

Could you share the train commands as well as the output of the train call? It sounds somewhat strange that --training=max.steps=0 would ever lead to a higher score unless you're running the train command with a pretrained model that's already learned from the data.

jiebei · March 5, 2023, 6:59am

of course. Thank you for your thoughts!

python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats

python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats

koaning · March 6, 2023, 11:22am

When I look at your output it seems that the scores and losses can fluctuate, but that they all eventually decrease. That's normal. This can be due to gradient descent that's overstepping, and there are other numeric phenomenon that can cause this effect as well.

This is also why the training procedure stores two models: model_best and model_latest. The most recent iteration of the model might not be the best performing one. This way, you won't need to worry too much about the max-steps.

Does this help or is there something still unclear? If it is, could you point to a specific metric in the loss table that's confusing and explain why?

jiebei · March 6, 2023, 3:00pm

Thank you for your explaining on that. It is helpful!
I guest that my confusion is when I was looking my detailed stats. As we are annotated our data with different labels, the score of different labels differ a a lot when the max steps are set differently. I was wondering if you can help me understand on that? Thank you!
max steps = 0

max step =2000

koaning · March 7, 2023, 9:21am

Just to check, when you run the stats for a model with --max-steps=0, are you sure it's not picking up a pre-trained model? Also, assuming there is a pre-trained model, could you elaborate on what you mean with "we annotated our data with different labels"? If you're training models on two different sets of data it also helps explain the difference.

Could you also share the commands that you use to generate these tables? Also, please feel free to post the stats as inline code instead of screenshots. It's much easier to search/copy/paste in text.

jiebei · March 9, 2023, 6:57am

We used ner.manual to annotate our data. The command is

python -m prodigy ner.manual citation_2nd_1 blank:en .\citation_2nd\To-annotate\beyond_profit.txt --label AGENCY_R,CASES,FOREIGN,CONSTITUTIONS, ...

By doing the ner.manual, we collected about 1400 annotations with different labels. After that, I trained th initial model, using below commands. The only difference is the max steps setting.

python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats

python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats

I guess what confused me is the score of the same label, such as F score of Journal is 92.96 when the max steps =0, and its F score is 73.68 when the max step =2000. Thank you for your thoughts!

koaning · March 9, 2023, 9:47am

Could you try running these commands:

python -m prodigy train model-with-2000-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats

python -m prodigy train model-with-1-step --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=1 --label-stats

python -m prodigy train model-with-0-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats

I worry that the results are skewed because you're pointing to the same folder. These runs each save their models into their own folders.

jiebei · March 12, 2023, 5:43am

Hello! I followed the commands you suggested and noticed that the results are quite similar to my previous trials. Can you please share your thoughts on this?

python -m prodigy train model-with-2000-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats

============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 1150 | Evaluation: 278 (from datasets)
Training: 1145 | Evaluation: 278
Labels: ner (16)
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00    128.55    0.00    0.00    0.00    0.00
  0     200       2719.65   3329.99   14.97   24.56   10.77    0.15
  0     400      13771.61   5036.63   15.46   23.44   11.54    0.15
  1     600      14911.51   4987.48   14.43   21.88   10.77    0.14
  2     800      24490.44   4489.47   36.51   39.64   33.85    0.37
  3    1000      23853.08   4820.55   29.29   32.11   26.92    0.29
  3    1200      19903.98   4142.67   53.22   60.19   47.69    0.53
  5    1400      40800.60   5401.68   42.98   46.43   40.00    0.43
  6    1600      41073.55   5112.98   50.00   51.64   48.46    0.50
  8    1800      53870.93   6966.36   44.08   46.96   41.54    0.44
 10    2000      66122.85   8084.27   58.27   59.68   56.92    0.58
✔ Saved pipeline to output directory
model-with-2000-steps\model-last

=============================== NER (per type) ===============================

                    P       R       F
JOURNAL         65.12   84.85   73.68
WORKING          0.00    0.00    0.00
MEDIA           20.00   16.67   18.18
BOOK            66.67   50.00   57.14
CASE_S          58.62   80.95   68.00
WEBSITES        55.56   27.78   37.04
STATUTES        83.33   83.33   83.33
COURT           50.00   33.33   40.00
DATASETS         0.00    0.00    0.00
AGENCY_D         0.00    0.00    0.00
CONSTITUTIONS    0.00    0.00    0.00

python -m prodigy train model-with-0-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats

E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00    128.55    0.00    0.00    0.00    0.00
  0     200       2719.65   3329.99   14.97   24.56   10.77    0.15
  0     400      13771.61   5036.63   15.46   23.44   11.54    0.15
  1     600      14911.51   4987.48   14.43   21.88   10.77    0.14
  2     800      24490.44   4489.47   36.51   39.64   33.85    0.37
  3    1000      23853.08   4820.55   29.29   32.11   26.92    0.29
  3    1200      19903.98   4142.67   53.22   60.19   47.69    0.53
  5    1400      40800.60   5401.68   42.98   46.43   40.00    0.43
  6    1600      41073.55   5112.98   50.00   51.64   48.46    0.50
  8    1800      53870.93   6966.36   44.08   46.96   41.54    0.44
 10    2000      66122.85   8084.27   58.27   59.68   56.92    0.58
 12    2200      62738.91   6870.88   62.02   62.50   61.54    0.62
 15    2400      72402.73   6430.36   65.61   67.48   63.85    0.66
 19    2600      75198.70   6506.38   68.68   67.41   70.00    0.69
 22    2800     105203.06   9315.15   71.04   71.32   70.77    0.71
 26    3000      70133.46   5546.94   70.99   70.45   71.54    0.71
 29    3200      65737.74   5291.23   77.22   77.52   76.92    0.77
 33    3400      50814.73   3965.48   80.30   79.10   81.54    0.80
 36    3600      51978.00   3492.22   83.33   82.09   84.62    0.83
 40    3800      44422.39   3439.81   85.71   86.05   85.38    0.86
 43    4000      38814.58   2563.37   86.36   85.07   87.69    0.86
 47    4200      69109.05   2896.03   83.40   83.72   83.08    0.83
 51    4400      47747.09   2280.28   84.17   84.50   83.85    0.84
 54    4600      24583.09   1518.34   87.79   87.12   88.46    0.88
 58    4800      23161.80   1472.89   86.49   86.82   86.15    0.86
 61    5000      36885.37   1092.25   89.39   88.06   90.77    0.89
 65    5200      20819.10    757.23   90.15   88.81   91.54    0.90
 68    5400      18255.33    641.18   87.88   86.57   89.23    0.88
 72    5600      17563.40    499.21   89.15   89.84   88.46    0.89
 75    5800       5718.46    393.41   90.49   89.47   91.54    0.90
 79    6000      17640.38    315.97   87.79   87.12   88.46    0.88
 82    6200      25171.84    532.83   89.58   89.92   89.23    0.90
 86    6400      18583.70    426.25   88.89   88.55   89.23    0.89
 89    6600      27922.14    630.66   89.23   89.23   89.23    0.89
 93    6800      24200.22    401.64   89.81   88.15   91.54    0.90
 96    7000      18341.90    345.09   87.88   86.57   89.23    0.88
100    7200      18282.35    246.37   88.30   86.67   90.00    0.88
103    7400      67315.05    359.63   88.80   89.15   88.46    0.89
✔ Saved pipeline to output directory
model-with-0-steps\model-last

=============================== NER (per type) ===============================

                     P        R        F
JOURNAL          86.84   100.00    92.96
WORKING         100.00    50.00    66.67
MEDIA            81.82    75.00    78.26
WEBSITES         94.12    88.89    91.43
BOOK             94.74    90.00    92.31
STATUTES         85.71   100.00    92.31
COURT            71.43    83.33    76.92
DATASETS         50.00   100.00    66.67
CASE_S          100.00    95.24    97.56
AGENCY_D        100.00   100.00   100.00
CONSTITUTIONS   100.00   100.00   100.00

koaning · March 13, 2023, 10:02am

Ah! Ok! Now I see. I misread one of your earlier comments, sorry about that! I now understand that it's running the full loop when the training steps is equal to zero. I initially had the impression this was about fluctuations in the score. Again, my bad!

I just ran everything myself locally and it seems like we've caught a small bug. I was able to reproduce this on a simple NER example in Prodigy but also on a textcat use-case that just relies on spaCy. Because the behavior also occurs there it feels safe to say this issue isn't caused by Prodigy.

Here's an example run with 0 steps.

python -m spacy train data/dataset/config.cfg --output training/dataset/ --paths.train data/dataset/train.spacy --paths.dev data/dataset/dev.spacy --training.max-steps=0

Notice the many many steps that it runs:

============================= Training pipeline =============================
ℹ Pipeline: ['textcat_multilabel']
ℹ Initial learn rate: 0.001
E    #       LOSS TEXTC...  CATS_SCORE  SCORE 
---  ------  -------------  ----------  ------
  0       0           0.25       57.55    0.58
  0     200          51.73       88.56    0.89
  1     400           7.81       91.20    0.91
  2     600           1.40       92.23    0.92
  3     800           0.60       91.50    0.91
  4    1000           0.27       91.64    0.92
  5    1200           0.22       91.50    0.91
  6    1400           0.14       91.20    0.91
  7    1600           0.14       91.35    0.91
  8    1800           0.10       91.64    0.92
  9    2000           0.09       91.20    0.91
 10    2200           0.08       91.50    0.91
✔ Saved pipeline to output directory

But here's the same one with only one step.

> python -m spacy train data/dataset/config.cfg --output training/dataset/ --paths.train data/dataset/train.spacy --paths.dev data/dataset/dev.spacy --training.max-steps=1

And now, we see way less.

============================= Training pipeline =============================
ℹ Pipeline: ['textcat_multilabel']
ℹ Initial learn rate: 0.001
E    #       LOSS TEXTC...  CATS_SCORE  SCORE 
---  ------  -------------  ----------  ------
  0       0           0.25       57.55    0.58
✔ Saved pipeline to output directory

It seems like if the maximum number of steps is equal to zero that it will assume a default instead.

Again, sorry about the unnecessary back and forth. It seems like --training.max-steps=0 is causing an inuintuitive default behavior. Just to check though, when you run --training.max-steps=1 then you do see the same behavior as me?

jiebei · March 14, 2023, 3:35am

Thank you so much for looking into this! I rerun the commands with max-steps=0, max-steps=1, max-steps=2000. For max-steps=0,max-steps=2000, the results remains the same. And for max-steps=1, my results are
(ENV) PS C:\Users\rapiduser\Workspace\zhang-annotations> python -m prodigy train model-with-1-step --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=1 --label-stats
Using CPU

========================= Generating Prodigy config =========================
Auto-generating config with spaCy
Generated training config

=========================== Initializing pipeline ===========================
[2023-03-13 23:32:28,829] [INFO] Set up nlp object from config
Components: ner
Merging training and evaluation data for 1 components

[ner] Training: 1150 | Evaluation: 278 (from datasets)
Training: 1145 | Evaluation: 278
Labels: ner (16)
[2023-03-13 23:32:29,126] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-03-13 23:32:29,142] [INFO] Created vocabulary
[2023-03-13 23:32:29,142] [INFO] Finished initializing nlp object
[2023-03-13 23:32:31,798] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
Initialized pipeline

============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components

[ner] Training: 1150 | Evaluation: 278 (from datasets)
Training: 1145 | Evaluation: 278
Labels: ner (16)
Pipeline: ['tok2vec', 'ner']
Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE

0 0 0.00 128.55 0.00 0.00 0.00 0.00
Saved pipeline to output directory
model-with-1-step\model-last

=============================== NER (per type) ===============================

               P      R      F

JOURNAL 0.00 0.00 0.00
WORKING 0.00 0.00 0.00
MEDIA 0.00 0.00 0.00
WEBSITES 0.00 0.00 0.00
BOOK 0.00 0.00 0.00
STATUTES 0.00 0.00 0.00
COURT 0.00 0.00 0.00
DATASETS 0.00 0.00 0.00
CASE_S 0.00 0.00 0.00
AGENCY_D 0.00 0.00 0.00
CONSTITUTIONS 0.00 0.00 0.00

koaning · March 14, 2023, 3:24pm

I just checked in with a spaCy colleague who pointed me to the spaCy API docs, where this behavior is documented. To quote what is mentioned there for the max-steps argument:

Maximum number of update steps to train for. 0 means an unlimited number of steps. Defaults to 20000.

So that explains the behavior!

jiebei · March 14, 2023, 5:58pm

That makes sense. Thank you!

Topic		Replies	Views
epochs spacy , training	1	634	May 17, 2023
Evaluation data for ner model ner	2	377	October 11, 2023
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1207	July 2, 2020
Created new model with Ner.manual, but train only outputs 0 scores usage , ner , training	9	461	September 7, 2023
Explanation Custom NER model folders ner , custom	3	698	March 6, 2023

How to under training.max_steps in train

Related topics