prodigy train result is different with the spacy train result, why?

hi @sigitpurnomo!

Sorry for the delayed response. We're trying to close out old issues.

Regarding comparing prodigy train and spacy train, I recommend anyone interested to check out the Prodigy sample project:

If you clone this repo, you can run two examples to compare spacy train and prodigy train.

Using sample fashion data, you can run spacy train by running

python -m spacy project run all 

This will load the data (db-in), export the data and config file (data-to-spacy), and spacy train (see the project.yml).

python -m spacy project run all
โ„น Running workflow 'all'

=================================== db-in ===================================
Running command: /opt/homebrew/opt/python@3.10/bin/python3.10 -m prodigy db-in fashion_brands_training assets/fashion_brands_training.jsonl
โœ” Created dataset 'fashion_brands_training' in database SQLite
โœ” Imported 1235 annotations to 'fashion_brands_training' (session
2023-02-03_15-10-32) in database SQLite
Found and keeping existing "answer" in 1235 examples
Running command: /opt/homebrew/opt/python@3.10/bin/python3.10 -m prodigy db-in fashion_brands_eval assets/fashion_brands_eval.jsonl
โœ” Created dataset 'fashion_brands_eval' in database SQLite
โœ” Imported 500 annotations to 'fashion_brands_eval' (session
2023-02-03_15-10-33) in database SQLite
Found and keeping existing "answer" in 500 examples

=============================== data-to-spacy ===============================
Running command: /opt/homebrew/opt/python@3.10/bin/python3.10 -m prodigy data-to-spacy corpus/ --ner fashion_brands_training,eval:fashion_brands_eval
โ„น Using language 'en'

============================== Generating data ==============================
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 1235 | Evaluation: 500 (from datasets)
Training: 1235 | Evaluation: 500
Labels: ner (1)
โœ” Saved 1235 training examples
corpus/train.spacy
โœ” Saved 500 evaluation examples
corpus/dev.spacy

============================= Generating config =============================
โ„น Auto-generating config with spaCy
โœ” Generated training config

======================== Generating cached label data ========================
โœ” Saving label data for component 'ner'
corpus/labels/ner.json

============================= Finalizing export =============================
โœ” Saved training config
corpus/config.cfg

To use this data for training with spaCy, you can run:
python -m spacy train corpus/config.cfg --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy

================================ train_spacy ================================
Running command: /opt/homebrew/opt/python@3.10/bin/python3.10 -m spacy train configs/config.cfg --output training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy --gpu-id -1
โ„น Saving to output directory: training
โ„น Using CPU
โ„น To switch to GPU 0, use the option: --gpu-id 0

=========================== Initializing pipeline ===========================
[2023-02-03 15:10:39,792] [INFO] Set up nlp object from config
[2023-02-03 15:10:39,799] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-02-03 15:10:39,801] [INFO] Created vocabulary
[2023-02-03 15:10:39,802] [INFO] Finished initializing nlp object
[2023-02-03 15:10:41,529] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
โœ” Initialized pipeline

============================= Training pipeline =============================
โ„น Pipeline: ['tok2vec', 'ner']
โ„น Initial learn rate: 0.0
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00     46.17    0.00    0.00    0.00    0.00
  0     200         10.44  14143.08    0.00    0.00    0.00    0.00                                                                                                                               
  0     400         17.36    921.48    0.00    0.00    0.00    0.00                                                                                                                               
  1     600         18.70    517.74    0.00    0.00    0.00    0.00                                                                                                                               
  1     800         22.26    619.64    0.83   50.00    0.42    0.01                                                                                                                               
  2    1000         26.61    656.45    4.84   60.00    2.52    0.05                                                                                                                               
  3    1200         29.64    745.70    9.67   41.94    5.46    0.10                                                                                                                               
  4    1400         37.73    754.50   20.98   47.76   13.45    0.21                                                                                                                               
  6    1600         82.65    884.78   30.59   46.96   22.69    0.31                                                                                                                               
  7    1800        391.86    984.87   36.60   49.64   28.99    0.37                                                                                                                               
  9    2000        354.41   1072.19   39.60   48.19   33.61    0.40                                                                                                                               
 12    2200        107.65    988.55   41.21   51.25   34.45    0.41                                                                                                                               
 15    2400        138.04   1029.82   47.12   55.06   41.18    0.47                                                                                                                               
 19    2600        149.17    955.62   50.24   57.61   44.54    0.50                                                                                                                               
 22    2800        124.06    703.44   50.84   59.22   44.54    0.51                                                                                                                               
 25    3000        121.32    583.64   53.72   62.57   47.06    0.54                                                                                                                               
 29    3200        112.32    431.85   54.55   63.33   47.90    0.55                                                                                                                               
 32    3400        115.82    384.64   55.77   65.17   48.74    0.56                                                                                                                               
 35    3600        122.27    307.42   55.50   64.44   48.74    0.56                                                                                                                               
 38    3800        124.70    295.25   57.84   69.41   49.58    0.58                                                                                                                               
 42    4000        153.26    254.92   57.56   68.60   49.58    0.58                                                                                                                               
 45    4200        183.82    225.83   57.63   68.00   50.00    0.58                                                                                                                               
 48    4400        191.45    206.76   57.62   66.48   50.84    0.58                                                                                                                               
 52    4600        183.82    170.08   57.42   66.67   50.42    0.57                                                                                                                               
 55    4800        104.11    106.09   57.76   66.85   50.84    0.58                                                                                                                               
 58    5000        132.83     96.88   57.97   68.18   50.42    0.58                                                                                                                               
 62    5200        104.80     78.27   59.51   70.93   51.26    0.60                                                                                                                               
 65    5400         94.62     77.89   59.66   71.35   51.26    0.60                                                                                                                               
 68    5600         88.30     58.62   59.51   70.93   51.26    0.60                                                                                                                               
 72    5800         91.84     43.24   60.00   71.51   51.68    0.60                                                                                                                               
 75    6000        132.88     50.87   59.86   68.85   52.94    0.60                                                                                                                               
 78    6200         77.27     42.82   60.59   73.21   51.68    0.61                                                                                                                               
 82    6400         73.68     33.23   60.78   72.94   52.10    0.61                                                                                                                               
 85    6600         79.77     29.21   61.65   72.99   53.36    0.62                                                                                                                               
 88    6800        125.10     44.11   61.69   72.32   53.78    0.62                                                                                                                               
 91    7000         62.31     29.18   61.95   73.84   53.36    0.62                                                                                                                               
 95    7200         44.03     19.51   61.99   73.14   53.78    0.62                                                                                                                               
 98    7400         46.05     15.76   60.98   72.67   52.52    0.61                                                                                                                               
101    7600         43.38     10.81   62.20   72.22   54.62    0.62                                                                                                                               
105    7800         25.63     10.48   58.65   72.67   49.16    0.59                                                                                                                               
108    8000         92.39     25.84   62.35   72.63   54.62    0.62                                                                                                                               
111    8200         27.62      9.18   62.65   73.45   54.62    0.63                                                                                                                               
115    8400         40.35     11.85   62.14   73.56   53.78    0.62                                                                                                                               
118    8600         24.75      8.94   62.05   71.82   54.62    0.62                                                                                                                               
121    8800         32.70     10.96   61.72   71.67   54.20    0.62                                                                                                                               
125    9000         23.91      7.12   61.24   71.11   53.78    0.61                                                                                                                               
128    9200         31.73     10.01   61.24   71.11   53.78    0.61                                                                                                                               
131    9400         65.21     20.19   61.72   71.67   54.20    0.62                                                                                                                               
134    9600         11.40      3.41   61.54   71.91   53.78    0.62                                                                                                                               
138    9800         21.41      6.48   61.69   72.32   53.78    0.62                                                                                                                               
Epoch 139:   0%|                                                                                                                                                          | 0/200 [00:00<?, ?it/s]โœ” Saved pipeline to output directory
training/model-last

Alternatively, you can run prodigy train on the same data by running all_prodigy:

$ python3 -m spacy project run all_prodigy
โ„น Running workflow 'all_prodigy'

=================================== db-in ===================================
Running command: /opt/homebrew/opt/python@3.10/bin/python3.10 -m prodigy db-in fashion_brands_training assets/fashion_brands_training.jsonl
โœ” Imported 1235 annotations to 'fashion_brands_training' (session
2023-02-03_15-19-02) in database SQLite
Found and keeping existing "answer" in 1235 examples
Running command: /opt/homebrew/opt/python@3.10/bin/python3.10 -m prodigy db-in fashion_brands_eval assets/fashion_brands_eval.jsonl
โœ” Imported 500 annotations to 'fashion_brands_eval' (session
2023-02-03_15-19-04) in database SQLite
Found and keeping existing "answer" in 500 examples

=============================== train_prodigy ===============================
Running command: /opt/homebrew/opt/python@3.10/bin/python3.10 -m prodigy train training/ --ner fashion_brands_training,eval:fashion_brands_eval --config configs/config.cfg --gpu-id -1
โ„น Using CPU
โ„น To switch to GPU 0, use the option: --gpu-id 0

========================= Generating Prodigy config =========================
โœ” Generated training config

=========================== Initializing pipeline ===========================
[2023-02-03 15:19:05,519] [INFO] Set up nlp object from config
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 2470 | Evaluation: 1000 (from datasets)
Training: 1235 | Evaluation: 500
Labels: ner (1)
[2023-02-03 15:19:05,818] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-02-03 15:19:05,820] [INFO] Created vocabulary
[2023-02-03 15:19:05,821] [INFO] Finished initializing nlp object
[2023-02-03 15:19:06,960] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
โœ” Initialized pipeline

============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 2470 | Evaluation: 1000 (from datasets)
Training: 1235 | Evaluation: 500
Labels: ner (1)
โ„น Pipeline: ['tok2vec', 'ner']
โ„น Initial learn rate: 0.0
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00     46.17    0.00    0.00    0.00    0.00
  0     200         10.44  14143.08    0.00    0.00    0.00    0.00
  0     400         17.36    921.48    0.00    0.00    0.00    0.00
  1     600         18.70    517.74    0.00    0.00    0.00    0.00
  1     800         22.26    619.64    0.83   50.00    0.42    0.01
  2    1000         26.61    656.45    4.84   60.00    2.52    0.05
  3    1200         29.64    745.70    9.67   41.94    5.46    0.10
  4    1400         37.73    754.50   20.98   47.76   13.45    0.21
  6    1600         82.65    884.78   30.59   46.96   22.69    0.31
  7    1800        391.86    984.87   36.60   49.64   28.99    0.37
  9    2000        354.41   1072.19   39.60   48.19   33.61    0.40
 12    2200        107.65    988.55   41.21   51.25   34.45    0.41
 15    2400        138.04   1029.82   47.12   55.06   41.18    0.47
 19    2600        149.17    955.62   50.24   57.61   44.54    0.50
 22    2800        124.06    703.44   50.84   59.22   44.54    0.51
 25    3000        121.32    583.64   53.72   62.57   47.06    0.54
 29    3200        112.32    431.85   54.55   63.33   47.90    0.55
 32    3400        115.82    384.64   55.77   65.17   48.74    0.56
 35    3600        122.27    307.42   55.50   64.44   48.74    0.56
 38    3800        124.70    295.25   57.84   69.41   49.58    0.58
 42    4000        153.26    254.92   57.56   68.60   49.58    0.58
 45    4200        183.82    225.83   57.63   68.00   50.00    0.58
 48    4400        191.45    206.76   57.62   66.48   50.84    0.58
 52    4600        183.82    170.08   57.42   66.67   50.42    0.57
 55    4800        104.11    106.09   57.76   66.85   50.84    0.58
 58    5000        132.83     96.88   57.97   68.18   50.42    0.58
 62    5200        104.80     78.27   59.51   70.93   51.26    0.60
 65    5400         94.62     77.89   59.66   71.35   51.26    0.60
 68    5600         88.30     58.62   59.51   70.93   51.26    0.60
 72    5800         91.84     43.24   60.00   71.51   51.68    0.60
 75    6000        132.88     50.87   59.86   68.85   52.94    0.60
 78    6200         77.27     42.82   60.59   73.21   51.68    0.61
 82    6400         73.68     33.23   60.78   72.94   52.10    0.61
 85    6600         79.77     29.21   61.65   72.99   53.36    0.62
 88    6800        125.10     44.11   61.69   72.32   53.78    0.62
 91    7000         62.31     29.18   61.95   73.84   53.36    0.62
 95    7200         44.03     19.51   61.99   73.14   53.78    0.62
 98    7400         46.05     15.76   60.98   72.67   52.52    0.61
101    7600         43.38     10.81   62.20   72.22   54.62    0.62
105    7800         25.63     10.48   58.65   72.67   49.16    0.59
108    8000         92.39     25.84   62.35   72.63   54.62    0.62
111    8200         27.62      9.18   62.65   73.45   54.62    0.63
115    8400         40.35     11.85   62.14   73.56   53.78    0.62
118    8600         24.75      8.94   62.05   71.82   54.62    0.62
121    8800         32.70     10.96   61.72   71.67   54.20    0.62
125    9000         23.91      7.12   61.24   71.11   53.78    0.61
128    9200         31.73     10.01   61.24   71.11   53.78    0.61
131    9400         65.21     20.19   61.72   71.67   54.20    0.62
134    9600         11.40      3.41   61.54   71.91   53.78    0.62
138    9800         21.41      6.48   61.69   72.32   53.78    0.62
โœ” Saved pipeline to output directory
training/model-last

From these two examples, you should get the same results!

Here's the versions:

$ python -m prodigy stats
============================== โœจ  Prodigy Stats ==============================

Version          1.11.10                       
Location         /opt/homebrew/lib/python3.10/site-packages/prodigy   
Platform         macOS-13.0.1-arm64-arm-64bit  
Python Version   3.10.8

$ python -m spacy info

============================== Info about spaCy ==============================

spaCy version    3.5.0                         
Location         /opt/homebrew/lib/python3.10/site-packages/spacy
Platform         macOS-13.0.1-arm64-arm-64bit  
Python version   3.10.8