Simple classification task with high loss

Hi there,

we trained a really simple text classifier. Just one category: It IS part of Category1 or not. But when we train our SpaCy model, we get a really bad LOSS but good overall results. And also the results in production with unknown, real user input are quite good. So we are wondering, why our LOSS is that bad and how we can fix it. With a LOSS like that, our model seems useless and our training data seems invalid. My last hope is that we made something wrong.

First try:

...
32 120.295 0.764 0.820
33 121.602 0.759 0.817
34 121.527 0.761 0.817
35 119.266 0.762 0.818
36 118.976 0.763 0.818
37 124.092 0.760 0.816
38 121.663 0.758 0.815
39 120.062 0.755 0.813
40 119.582 0.757 0.815
41 118.198 0.754 0.812
42 121.993 0.757 0.814
43 117.586 0.758 0.815
44 118.117 0.756 0.813
45 114.655 0.751 0.809
46 117.918 0.746 0.806
47 117.320 0.747 0.806
48 114.923 0.741 0.802
49 114.487 0.742 0.802
50 114.845 0.738 0.799

MODEL USER COUNT
accept accept 1103
accept reject 293
reject reject 1994
reject accept 350

Correct 3097
Incorrect 643

Baseline 0.61
Precision 0.79
Recall 0.76
F-score 0.77
Accuracy 0.83

Another try:

:~# python3.5 -m prodigy textcat.batch-train meda de_core_news_sm --output medamodel --n-iter 200 --eval-split 0.3 --dropout 0.4

Loaded model de_core_news_sm
Using 30% of examples (6000) for evaluation
Using 100% of remaining examples (14003) for training
Dropout: 0.4 Batch size: 10 Iterations: 200

LOSS F-SCORE ACCURACY

01 275.261 0.667 0.780
02 213.948 0.721 0.805
03 192.139 0.750 0.819
04 179.622 0.763 0.826
05 174.347 0.765 0.828
06 170.077 0.774 0.834
07 168.207 0.769 0.831
08 167.648 0.766 0.828
09 164.801 0.769 0.830
10 161.509 0.762 0.823
11 160.314 0.762 0.823
12 158.902 0.754 0.818
13 157.673 0.758 0.820
14 153.786 0.754 0.817
15 152.665 0.755 0.817
16 146.840 0.754 0.816
17 145.121 0.751 0.813

Trained with all samples:

:~# python3.5 -m prodigy textcat.batch-train onmeda de_core_news_sm --output onmedamodel --n-iter 10 --eval-split 1 --dropout 0.2

Loaded model de_core_news_sm
Using 100% of examples (20003) for evaluation
Using 100% of remaining examples (20003) for training
Dropout: 0.2 Batch size: 10 Iterations: 10

LOSS F-SCORE ACCURACY

01 366.735 0.805 0.859
02 241.132 0.856 0.893
03 216.938 0.872 0.905
04 211.305 0.883 0.912
05 207.649 0.886 0.915
06 201.347 0.893 0.920
07 201.519 0.896 0.921
08 201.949 0.897 0.922
09 195.804 0.899 0.923
10 193.449 0.901 0.925

MODEL USER COUNT
accept accept 6617
accept reject 688
reject reject 11256
reject accept 762

Correct 17873
Incorrect 1450

Baseline 0.62
Precision 0.91
Recall 0.90
F-score 0.90
Accuracy 0.92

Why do you care about the loss? Often it’s not good for the loss to go to zero, and often it won’t, especially with dropout.