Hello, I am training a ner model using this command. I found the score varies when I change --training.max_steps in my command, e.g., the score of training.max_steps=1000 is 0.29, training.max_steps=2000 is 0.58, and the score goes up to 0.9 when the --training.max_steps =0. I do not quite understnd why. I was wondering if you could help me understand the setting of training.max_steps and its impact on the model performance metrics? Thank you!
Hi there!
Could you share the train commands as well as the output of the train call? It sounds somewhat strange that --training=max.steps=0
would ever lead to a higher score unless you're running the train
command with a pretrained model that's already learned from the data.
of course. Thank you for your thoughts!
python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats
python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats
When I look at your output it seems that the scores and losses can fluctuate, but that they all eventually decrease. That's normal. This can be due to gradient descent that's overstepping, and there are other numeric phenomenon that can cause this effect as well.
This is also why the training procedure stores two models: model_best
and model_latest
. The most recent iteration of the model might not be the best performing one. This way, you won't need to worry too much about the max-steps
.
Does this help or is there something still unclear? If it is, could you point to a specific metric in the loss table that's confusing and explain why?
Thank you for your explaining on that. It is helpful!
I guest that my confusion is when I was looking my detailed stats. As we are annotated our data with different labels, the score of different labels differ a a lot when the max steps are set differently. I was wondering if you can help me understand on that? Thank you!
max steps = 0
max step =2000
Just to check, when you run the stats for a model with --max-steps=0
, are you sure it's not picking up a pre-trained model? Also, assuming there is a pre-trained model, could you elaborate on what you mean with "we annotated our data with different labels"? If you're training models on two different sets of data it also helps explain the difference.
Could you also share the commands that you use to generate these tables? Also, please feel free to post the stats as inline code instead of screenshots. It's much easier to search/copy/paste in text.
We used ner.manual to annotate our data. The command is
python -m prodigy ner.manual citation_2nd_1 blank:en .\citation_2nd\To-annotate\beyond_profit.txt --label AGENCY_R,CASES,FOREIGN,CONSTITUTIONS, ...
By doing the ner.manual, we collected about 1400 annotations with different labels. After that, I trained th initial model, using below commands. The only difference is the max steps setting.
python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats
python -m prodigy train citation_2nd_1_model --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats
I guess what confused me is the score of the same label, such as F score of Journal is 92.96 when the max steps =0, and its F score is 73.68 when the max step =2000. Thank you for your thoughts!
Could you try running these commands:
python -m prodigy train model-with-2000-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats
python -m prodigy train model-with-1-step --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=1 --label-stats
python -m prodigy train model-with-0-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats
I worry that the results are skewed because you're pointing to the same folder. These runs each save their models into their own folders.
Hello! I followed the commands you suggested and noticed that the results are quite similar to my previous trials. Can you please share your thoughts on this?
python -m prodigy train model-with-2000-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=2000 --label-stats
============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 1150 | Evaluation: 278 (from datasets)
Training: 1145 | Evaluation: 278
Labels: ner (16)
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------ -------- ------ ------ ------ ------
0 0 0.00 128.55 0.00 0.00 0.00 0.00
0 200 2719.65 3329.99 14.97 24.56 10.77 0.15
0 400 13771.61 5036.63 15.46 23.44 11.54 0.15
1 600 14911.51 4987.48 14.43 21.88 10.77 0.14
2 800 24490.44 4489.47 36.51 39.64 33.85 0.37
3 1000 23853.08 4820.55 29.29 32.11 26.92 0.29
3 1200 19903.98 4142.67 53.22 60.19 47.69 0.53
5 1400 40800.60 5401.68 42.98 46.43 40.00 0.43
6 1600 41073.55 5112.98 50.00 51.64 48.46 0.50
8 1800 53870.93 6966.36 44.08 46.96 41.54 0.44
10 2000 66122.85 8084.27 58.27 59.68 56.92 0.58
✔ Saved pipeline to output directory
model-with-2000-steps\model-last
=============================== NER (per type) ===============================
P R F
JOURNAL 65.12 84.85 73.68
WORKING 0.00 0.00 0.00
MEDIA 20.00 16.67 18.18
BOOK 66.67 50.00 57.14
CASE_S 58.62 80.95 68.00
WEBSITES 55.56 27.78 37.04
STATUTES 83.33 83.33 83.33
COURT 50.00 33.33 40.00
DATASETS 0.00 0.00 0.00
AGENCY_D 0.00 0.00 0.00
CONSTITUTIONS 0.00 0.00 0.00
python -m prodigy train model-with-0-steps --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=0 --label-stats
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------ -------- ------ ------ ------ ------
0 0 0.00 128.55 0.00 0.00 0.00 0.00
0 200 2719.65 3329.99 14.97 24.56 10.77 0.15
0 400 13771.61 5036.63 15.46 23.44 11.54 0.15
1 600 14911.51 4987.48 14.43 21.88 10.77 0.14
2 800 24490.44 4489.47 36.51 39.64 33.85 0.37
3 1000 23853.08 4820.55 29.29 32.11 26.92 0.29
3 1200 19903.98 4142.67 53.22 60.19 47.69 0.53
5 1400 40800.60 5401.68 42.98 46.43 40.00 0.43
6 1600 41073.55 5112.98 50.00 51.64 48.46 0.50
8 1800 53870.93 6966.36 44.08 46.96 41.54 0.44
10 2000 66122.85 8084.27 58.27 59.68 56.92 0.58
12 2200 62738.91 6870.88 62.02 62.50 61.54 0.62
15 2400 72402.73 6430.36 65.61 67.48 63.85 0.66
19 2600 75198.70 6506.38 68.68 67.41 70.00 0.69
22 2800 105203.06 9315.15 71.04 71.32 70.77 0.71
26 3000 70133.46 5546.94 70.99 70.45 71.54 0.71
29 3200 65737.74 5291.23 77.22 77.52 76.92 0.77
33 3400 50814.73 3965.48 80.30 79.10 81.54 0.80
36 3600 51978.00 3492.22 83.33 82.09 84.62 0.83
40 3800 44422.39 3439.81 85.71 86.05 85.38 0.86
43 4000 38814.58 2563.37 86.36 85.07 87.69 0.86
47 4200 69109.05 2896.03 83.40 83.72 83.08 0.83
51 4400 47747.09 2280.28 84.17 84.50 83.85 0.84
54 4600 24583.09 1518.34 87.79 87.12 88.46 0.88
58 4800 23161.80 1472.89 86.49 86.82 86.15 0.86
61 5000 36885.37 1092.25 89.39 88.06 90.77 0.89
65 5200 20819.10 757.23 90.15 88.81 91.54 0.90
68 5400 18255.33 641.18 87.88 86.57 89.23 0.88
72 5600 17563.40 499.21 89.15 89.84 88.46 0.89
75 5800 5718.46 393.41 90.49 89.47 91.54 0.90
79 6000 17640.38 315.97 87.79 87.12 88.46 0.88
82 6200 25171.84 532.83 89.58 89.92 89.23 0.90
86 6400 18583.70 426.25 88.89 88.55 89.23 0.89
89 6600 27922.14 630.66 89.23 89.23 89.23 0.89
93 6800 24200.22 401.64 89.81 88.15 91.54 0.90
96 7000 18341.90 345.09 87.88 86.57 89.23 0.88
100 7200 18282.35 246.37 88.30 86.67 90.00 0.88
103 7400 67315.05 359.63 88.80 89.15 88.46 0.89
✔ Saved pipeline to output directory
model-with-0-steps\model-last
=============================== NER (per type) ===============================
P R F
JOURNAL 86.84 100.00 92.96
WORKING 100.00 50.00 66.67
MEDIA 81.82 75.00 78.26
WEBSITES 94.12 88.89 91.43
BOOK 94.74 90.00 92.31
STATUTES 85.71 100.00 92.31
COURT 71.43 83.33 76.92
DATASETS 50.00 100.00 66.67
CASE_S 100.00 95.24 97.56
AGENCY_D 100.00 100.00 100.00
CONSTITUTIONS 100.00 100.00 100.00
Ah! Ok! Now I see. I misread one of your earlier comments, sorry about that! I now understand that it's running the full loop when the training steps is equal to zero. I initially had the impression this was about fluctuations in the score. Again, my bad!
I just ran everything myself locally and it seems like we've caught a small bug. I was able to reproduce this on a simple NER example in Prodigy but also on a textcat use-case that just relies on spaCy. Because the behavior also occurs there it feels safe to say this issue isn't caused by Prodigy.
Here's an example run with 0 steps.
python -m spacy train data/dataset/config.cfg --output training/dataset/ --paths.train data/dataset/train.spacy --paths.dev data/dataset/dev.spacy --training.max-steps=0
Notice the many many steps that it runs:
============================= Training pipeline =============================
ℹ Pipeline: ['textcat_multilabel']
ℹ Initial learn rate: 0.001
E # LOSS TEXTC... CATS_SCORE SCORE
--- ------ ------------- ---------- ------
0 0 0.25 57.55 0.58
0 200 51.73 88.56 0.89
1 400 7.81 91.20 0.91
2 600 1.40 92.23 0.92
3 800 0.60 91.50 0.91
4 1000 0.27 91.64 0.92
5 1200 0.22 91.50 0.91
6 1400 0.14 91.20 0.91
7 1600 0.14 91.35 0.91
8 1800 0.10 91.64 0.92
9 2000 0.09 91.20 0.91
10 2200 0.08 91.50 0.91
✔ Saved pipeline to output directory
But here's the same one with only one step.
> python -m spacy train data/dataset/config.cfg --output training/dataset/ --paths.train data/dataset/train.spacy --paths.dev data/dataset/dev.spacy --training.max-steps=1
And now, we see way less.
============================= Training pipeline =============================
ℹ Pipeline: ['textcat_multilabel']
ℹ Initial learn rate: 0.001
E # LOSS TEXTC... CATS_SCORE SCORE
--- ------ ------------- ---------- ------
0 0 0.25 57.55 0.58
✔ Saved pipeline to output directory
It seems like if the maximum number of steps is equal to zero that it will assume a default instead.
Again, sorry about the unnecessary back and forth. It seems like --training.max-steps=0
is causing an inuintuitive default behavior. Just to check though, when you run --training.max-steps=1
then you do see the same behavior as me?
Thank you so much for looking into this! I rerun the commands with max-steps=0, max-steps=1, max-steps=2000. For max-steps=0,max-steps=2000, the results remains the same. And for max-steps=1, my results are
(ENV) PS C:\Users\rapiduser\Workspace\zhang-annotations> python -m prodigy train model-with-1-step --ner citation_2nd_1_train,eval:citation_2nd_1_eval --training.max_steps=1 --label-stats
Using CPU
========================= Generating Prodigy config =========================
Auto-generating config with spaCy
Generated training config
=========================== Initializing pipeline ===========================
[2023-03-13 23:32:28,829] [INFO] Set up nlp object from config
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 1150 | Evaluation: 278 (from datasets)
Training: 1145 | Evaluation: 278
Labels: ner (16)
[2023-03-13 23:32:29,126] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-03-13 23:32:29,142] [INFO] Created vocabulary
[2023-03-13 23:32:29,142] [INFO] Finished initializing nlp object
[2023-03-13 23:32:31,798] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
Initialized pipeline
============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 1150 | Evaluation: 278 (from datasets)
Training: 1145 | Evaluation: 278
Labels: ner (16)
Pipeline: ['tok2vec', 'ner']
Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
0 0 0.00 128.55 0.00 0.00 0.00 0.00
Saved pipeline to output directory
model-with-1-step\model-last
=============================== NER (per type) ===============================
P R F
JOURNAL 0.00 0.00 0.00
WORKING 0.00 0.00 0.00
MEDIA 0.00 0.00 0.00
WEBSITES 0.00 0.00 0.00
BOOK 0.00 0.00 0.00
STATUTES 0.00 0.00 0.00
COURT 0.00 0.00 0.00
DATASETS 0.00 0.00 0.00
CASE_S 0.00 0.00 0.00
AGENCY_D 0.00 0.00 0.00
CONSTITUTIONS 0.00 0.00 0.00
I just checked in with a spaCy colleague who pointed me to the spaCy API docs, where this behavior is documented. To quote what is mentioned there for the max-steps
argument:
Maximum number of update steps to train for.
0
means an unlimited number of steps. Defaults to20000
.
So that explains the behavior!
That makes sense. Thank you!