NEL: Getting train, validation data during training

I've had success training a NEL model through the example notebook provided by @sofie-vl using a modified notebook on Google Colab - nel/large_data.ipynb at main · 12dmj/nel · GitHub

What I haven't been able to achieve with the notebook is get the loss and accuracy of the test, train data so that I can plot them out for my school report. I am able to get the loss, which I can graph but not compare it to the training data. So that my report can look similar to:

I've tried using Scorer, but I hasn't worked with my code.

I can get the f-score from the command line process with the project pipeline process with python -m spacy project run all But the f-score that is output doesn't seem to improve:

E    #       LOSS ENTIT...  SENTS_F  SENTS_P  SENTS_R  ENTS_F  ENTS_P  ENTS_R  NEL_MICRO_F  NEL_MICRO_R  NEL_MICRO_P  SCORE 
---  ------  -------------  -------  -------  -------  ------  ------  ------  -----------  -----------  -----------  ------
  0       0           2.04    24.70    16.89    45.92    0.00    0.00    0.00        93.47        93.47        93.47    0.39
  0     200         115.47    24.70    16.89    45.92    0.00    0.00    0.00        93.47        93.47        93.47    0.39
  0     400          99.57    24.70    16.89    45.92    0.00    0.00    0.00        93.47        93.47        93.47    0.39
  0     600         103.47    24.70    16.89    45.92    0.00    0.00    0.00        93.47        93.47        93.47    0.39
  0     800         104.80    24.70    16.89    45.92    0.00    0.00    0.00        93.47        93.47        93.47    0.39
  0    1000         112.13    24.70    16.89    45.92    0.00    0.00    0.00        93.47        93.47        93.47    0.39
  0    1200         121.69    24.70    16.89    45.92    0.00    0.00    0.00        93.47        93.47        93.47    0.39
  0    1400         126.43    24.70    16.89    45.92    0.00    0.00    0.00        93.47        93.47        93.47    0.39

Where the loss data from the notebook is

0 Losses {'entity_linker': 518.577010139823}
Time elapsed: 00:07:54.11 - at 2021-11-05 17:13:01.988861
5 Losses {'entity_linker': 282.10121862217784}
Time elapsed: 00:19:41.16 - at 2021-11-05 17:24:49.043269
10 Losses {'entity_linker': 235.08666918426752}
Time elapsed: 00:31:18.68 - at 2021-11-05 17:36:26.556032
15 Losses {'entity_linker': 208.01303820684552}
Time elapsed: 00:42:54.42 - at 2021-11-05 17:48:02.298448
20 Losses {'entity_linker': 192.7957752197981}
Time elapsed: 00:54:36.59 - at 2021-11-05 17:59:44.470805
25 Losses {'entity_linker': 179.44674559496343}
Time elapsed: 01:06:11.25 - at 2021-11-05 18:11:19.127751
30 Losses {'entity_linker': 171.86543752253056}
Time elapsed: 01:17:44.96 - at 2021-11-05 18:22:52.839690
35 Losses {'entity_linker': 164.67195125389844}
Time elapsed: 01:29:18.39 - at 2021-11-05 18:34:26.271681
40 Losses {'entity_linker': 158.6659845309332}
Time elapsed: 01:41:07.95 - at 2021-11-05 18:46:15.831751
45 Losses {'entity_linker': 154.5890299268067}
Time elapsed: 01:53:06.22 - at 2021-11-05 18:58:14.097549
50 Losses {'entity_linker': 151.7500482723117}
Time elapsed: 02:05:10.49 - at 2021-11-05 19:10:18.367776
55 Losses {'entity_linker': 148.73269990086555}
Time elapsed: 02:17:17.06 - at 2021-11-05 19:22:24.936754
60 Losses {'entity_linker': 145.96235537715256}
Time elapsed: 02:29:13.19 - at 2021-11-05 19:34:21.067723
65 Losses {'entity_linker': 142.98663274757564}
Time elapsed: 02:41:00.52 - at 2021-11-05 19:46:08.397541
70 Losses {'entity_linker': 141.86793559789658}
Time elapsed: 02:52:42.45 - at 2021-11-05 19:57:50.333669
75 Losses {'entity_linker': 140.22107696905732}
Time elapsed: 03:04:25.14 - at 2021-11-05 20:09:33.020128
80 Losses {'entity_linker': 139.8034223932773}
Time elapsed: 03:16:25.00 - at 2021-11-05 20:21:32.875989
85 Losses {'entity_linker': 137.95739856828004}
Time elapsed: 03:28:19.03 - at 2021-11-05 20:33:26.904670
90 Losses {'entity_linker': 136.77905424684286}
Time elapsed: 03:40:13.77 - at 2021-11-05 20:45:21.647000
95 Losses {'entity_linker': 136.74791371263564}
Time elapsed: 03:52:11.12 - at 2021-11-05 20:57:19.003574
100 Losses {'entity_linker': 134.7272301800549}
Time elapsed: 04:04:05.86 - at 2021-11-05 21:09:13.740258
105 Losses {'entity_linker': 134.6887142751366}
Time elapsed: 04:15:55.32 - at 2021-11-05 21:21:03.201959
110 Losses {'entity_linker': 132.8789926264435}
Time elapsed: 04:27:40.94 - at 2021-11-05 21:32:48.815430
115 Losses {'entity_linker': 133.09616295807064}
Time elapsed: 04:39:17.54 - at 2021-11-05 21:44:25.415312
120 Losses {'entity_linker': 132.17877416871488}
Time elapsed: 04:50:51.26 - at 2021-11-05 21:55:59.138112
125 Losses {'entity_linker': 131.9255609298125}
Time elapsed: 05:02:24.06 - at 2021-11-05 22:07:31.936761
130 Losses {'entity_linker': 130.9680900387466}
Time elapsed: 05:13:57.14 - at 2021-11-05 22:19:05.017890
135 Losses {'entity_linker': 130.61831521056592}
Time elapsed: 05:25:30.34 - at 2021-11-05 22:30:38.216087
140 Losses {'entity_linker': 129.9402158074081}
Time elapsed: 05:37:34.98 - at 2021-11-05 22:42:42.854389
145 Losses {'entity_linker': 129.83658009581268}
Time elapsed: 05:49:43.88 - at 2021-11-05 22:54:51.763642
150 Losses {'entity_linker': 130.49756509438157}
Time elapsed: 06:01:55.24 - at 2021-11-05 23:07:03.115556
155 Losses {'entity_linker': 129.00444552488625}
Time elapsed: 06:14:05.00 - at 2021-11-05 23:19:12.880497
160 Losses {'entity_linker': 129.75691447872669}
Time elapsed: 06:26:13.94 - at 2021-11-05 23:31:21.819107
165 Losses {'entity_linker': 127.22069923765957}
Time elapsed: 06:38:24.69 - at 2021-11-05 23:43:32.563884
170 Losses {'entity_linker': 128.55165950022638}
Time elapsed: 06:50:31.92 - at 2021-11-05 23:55:39.796841
175 Losses {'entity_linker': 127.8774628546089}
Time elapsed: 07:02:38.39 - at 2021-11-06 00:07:46.268007
180 Losses {'entity_linker': 126.73333430942148}
Time elapsed: 07:14:45.28 - at 2021-11-06 00:19:53.162753
185 Losses {'entity_linker': 128.17208409309387}
Time elapsed: 07:26:50.75 - at 2021-11-06 00:31:58.632593
190 Losses {'entity_linker': 127.87618801370263}
Time elapsed: 07:38:58.56 - at 2021-11-06 00:44:06.441755
195 Losses {'entity_linker': 128.18922368716449}
Time elapsed: 07:51:03.61 - at 2021-11-06 00:56:11.492171
200 Losses {'entity_linker': 126.44857790507376}
Time elapsed: 08:03:10.35 - at 2021-11-06 01:08:18.228360
205 Losses {'entity_linker': 126.16584806889296}
Time elapsed: 08:15:17.99 - at 2021-11-06 01:20:25.872093
210 Losses {'entity_linker': 126.0028964728117}
Time elapsed: 08:27:26.07 - at 2021-11-06 01:32:33.943712
215 Losses {'entity_linker': 126.85218511987478}
Time elapsed: 08:39:33.80 - at 2021-11-06 01:44:41.679172
220 Losses {'entity_linker': 124.49108471721411}
Time elapsed: 08:51:40.70 - at 2021-11-06 01:56:48.577129
225 Losses {'entity_linker': 124.94386644475162}
Time elapsed: 09:03:45.68 - at 2021-11-06 02:08:53.554664
230 Losses {'entity_linker': 124.88238729164004}
Time elapsed: 09:15:50.71 - at 2021-11-06 02:20:58.588019
235 Losses {'entity_linker': 124.9762195115909}
Time elapsed: 09:27:56.12 - at 2021-11-06 02:33:03.998654
240 Losses {'entity_linker': 124.58911141008139}
Time elapsed: 09:40:02.03 - at 2021-11-06 02:45:09.905797
245 Losses {'entity_linker': 125.0563224023208}
Time elapsed: 09:52:07.99 - at 2021-11-06 02:57:15.866987
250 Losses {'entity_linker': 124.43282091896981}
Time elapsed: 10:04:13.39 - at 2021-11-06 03:09:21.268913
255 Losses {'entity_linker': 124.70789887569845}
Time elapsed: 10:16:20.07 - at 2021-11-06 03:21:27.948591
260 Losses {'entity_linker': 123.92688186373562}
Time elapsed: 10:28:29.66 - at 2021-11-06 03:33:37.538095
265 Losses {'entity_linker': 124.68445670697838}
Time elapsed: 10:40:32.98 - at 2021-11-06 03:45:40.861821
270 Losses {'entity_linker': 123.76532926037908}
Time elapsed: 10:52:36.12 - at 2021-11-06 03:57:43.997714
275 Losses {'entity_linker': 123.69187347963452}
Time elapsed: 11:04:40.26 - at 2021-11-06 04:09:48.136524
280 Losses {'entity_linker': 124.56343223433942}
Time elapsed: 11:16:41.17 - at 2021-11-06 04:21:49.052444
285 Losses {'entity_linker': 123.60267618205398}
Time elapsed: 11:28:47.92 - at 2021-11-06 04:33:55.795044
290 Losses {'entity_linker': 123.96062463335693}
Time elapsed: 11:40:52.13 - at 2021-11-06 04:46:00.006996
295 Losses {'entity_linker': 124.1748896278441}
Time elapsed: 11:52:55.35 - at 2021-11-06 04:58:03.227913
300 Losses {'entity_linker': 123.5451480569318}
Time elapsed: 12:04:59.47 - at 2021-11-06 05:10:07.346182
305 Losses {'entity_linker': 121.73879183921963}
Time elapsed: 12:17:03.37 - at 2021-11-06 05:22:11.247889
310 Losses {'entity_linker': 123.61264295782894}
Time elapsed: 12:29:09.44 - at 2021-11-06 05:34:17.319159
315 Losses {'entity_linker': 123.59118335414678}
Time elapsed: 12:41:13.85 - at 2021-11-06 05:46:21.724142
320 Losses {'entity_linker': 122.6433570375666}
Time elapsed: 12:53:17.72 - at 2021-11-06 05:58:25.596039
325 Losses {'entity_linker': 122.93811170663685}
Time elapsed: 13:05:21.36 - at 2021-11-06 06:10:29.240857
330 Losses {'entity_linker': 122.24370755720884}
Time elapsed: 13:17:25.44 - at 2021-11-06 06:22:33.317990
335 Losses {'entity_linker': 122.34091891720891}
Time elapsed: 13:29:28.63 - at 2021-11-06 06:34:36.512394
340 Losses {'entity_linker': 120.78926514089108}
Time elapsed: 13:41:31.94 - at 2021-11-06 06:46:39.815138
345 Losses {'entity_linker': 121.15306690800935}
Time elapsed: 13:53:33.74 - at 2021-11-06 06:58:41.614551
350 Losses {'entity_linker': 123.84123289491981}
Time elapsed: 14:05:36.74 - at 2021-11-06 07:10:44.618560
355 Losses {'entity_linker': 123.0095909498632}
Time elapsed: 14:17:39.51 - at 2021-11-06 07:22:47.388996
360 Losses {'entity_linker': 122.71735720802099}
Time elapsed: 14:29:45.08 - at 2021-11-06 07:34:52.955839
365 Losses {'entity_linker': 122.12927337177098}
Time elapsed: 14:41:50.06 - at 2021-11-06 07:46:57.934203
370 Losses {'entity_linker': 121.42857379093766}
Time elapsed: 14:53:55.13 - at 2021-11-06 07:59:03.010787
375 Losses {'entity_linker': 122.97060009092093}
Time elapsed: 15:05:59.73 - at 2021-11-06 08:11:07.609987
380 Losses {'entity_linker': 122.79607174172997}
Time elapsed: 15:18:04.22 - at 2021-11-06 08:23:12.100987
385 Losses {'entity_linker': 122.65427322778851}
Time elapsed: 15:30:07.50 - at 2021-11-06 08:35:15.383062
390 Losses {'entity_linker': 121.21647365763783}
Time elapsed: 15:42:11.37 - at 2021-11-06 08:47:19.243729
395 Losses {'entity_linker': 122.21294816490263}
Time elapsed: 15:54:15.59 - at 2021-11-06 08:59:23.467010
400 Losses {'entity_linker': 121.78573988005519}
Time elapsed: 16:06:19.88 - at 2021-11-06 09:11:27.755565
405 Losses {'entity_linker': 122.31534681003541}
Time elapsed: 16:18:22.67 - at 2021-11-06 09:23:30.549065
410 Losses {'entity_linker': 121.08166352286935}
Time elapsed: 16:30:33.42 - at 2021-11-06 09:35:41.297676
415 Losses {'entity_linker': 119.88355445023626}
Time elapsed: 16:42:38.52 - at 2021-11-06 09:47:46.400294
420 Losses {'entity_linker': 121.11153571307659}
Time elapsed: 16:54:42.04 - at 2021-11-06 09:59:49.921937
425 Losses {'entity_linker': 119.8680424420163}
Time elapsed: 17:06:46.09 - at 2021-11-06 10:11:53.973012
430 Losses {'entity_linker': 122.27820878755301}
Time elapsed: 17:18:50.13 - at 2021-11-06 10:23:58.012605
435 Losses {'entity_linker': 121.86225686222315}
Time elapsed: 17:30:53.89 - at 2021-11-06 10:36:01.764816
440 Losses {'entity_linker': 121.2234854279086}
Time elapsed: 17:42:58.01 - at 2021-11-06 10:48:05.889355
445 Losses {'entity_linker': 121.33395940996706}
Time elapsed: 17:55:02.16 - at 2021-11-06 11:00:10.035161
450 Losses {'entity_linker': 120.08713430073112}
Time elapsed: 18:07:04.23 - at 2021-11-06 11:12:12.111882
455 Losses {'entity_linker': 121.44447956979275}
Time elapsed: 18:19:06.96 - at 2021-11-06 11:24:14.839405
460 Losses {'entity_linker': 121.0112966671586}
Time elapsed: 18:31:11.53 - at 2021-11-06 11:36:19.408886
465 Losses {'entity_linker': 121.41687434818596}
Time elapsed: 18:43:16.26 - at 2021-11-06 11:48:24.134297
470 Losses {'entity_linker': 119.72364311758429}
Time elapsed: 18:55:20.12 - at 2021-11-06 12:00:27.996676
475 Losses {'entity_linker': 120.77664417214692}
Time elapsed: 19:07:24.54 - at 2021-11-06 12:12:32.414234
480 Losses {'entity_linker': 119.86388497333974}
Time elapsed: 19:19:29.57 - at 2021-11-06 12:24:37.444944
485 Losses {'entity_linker': 119.7786033526063}
Time elapsed: 19:31:34.86 - at 2021-11-06 12:36:42.738120
490 Losses {'entity_linker': 120.38681644294411}
Time elapsed: 19:43:39.63 - at 2021-11-06 12:48:47.511459
495 Losses {'entity_linker': 119.29969577398151}
Time elapsed: 19:55:45.23 - at 2021-11-06 13:00:53.109371
499 Losses {'entity_linker': 120.38904602453113}

The command line process doesn't seem to save a large model either, where the notebook does.

Is there a way to get train, test data during training via the notebook, or what might be wrong with the command line training?

All code and data available in the Github link. Thanks

Hi!

This question would have probably been more appropriate for the spaCy discussions forum, as it doesn't seem to involve Prodigy directly.

I'm not sure I understand exactly what it is you're trying to achieve. The picture you shared of the loss curve, where is that coming from? In general, spaCy only provides figures for the loss on the training dataset, and figures for accuracy/F/p/r on the dev set. You should be able to calculate F-score on your training set with the scorer as well. You said

I've tried using Scorer, but I hasn't worked with my code.

But I'm afraid there's little I can do/say to help without more details. In particular, a minimum viable example of a code snippet that isn't working, would be helpful.

But the f-score that is output doesn't seem to improve:

If you're running this on the example project that only contains a few data points - this kind of output is to be expected. The F-score is 93% on the dev set. The ML algorithm isn't able to improve upon it further because the training data is so limited.

Where the loss data from the notebook is ...

This notebook was originally written for spaCy v2, in which it was normal to write custom training loops. In v3 however, we recommend using the command line for training, in combination with a training config file. Both should eventually amount to the same thing though. Do note that the notebook prints losses every 5 iterations, whereas the spaCy train command prints every 200 iterations by default.

The command line process doesn't seem to save a large model either, where the notebook does.

Is the spacy train command specifying an output dir?

Apologies, I didn't realize that there was a discussions section on github. I'll move my discussions there.