I'm training a text cat model. I am training a model with emails. Each email has max 150 tokens. I have a separate evaluation set, 5 positive and 5 negative. During the training the loss is going down but the F-score is going up and down and up. Which is different from what i normally see - F1 score going up while the loss goes down. And after 30 iteration, the model's F1 score is 0.63 which is way below baseline 0.8. What could be wrong?
Here is the train output -
Baseline accuracy: 0.800
Loss F-Score
1 16.84 0.500
2 12.83 0.367
3 8.44 0.667
4 6.25 0.800
5 7.50 0.667
6 4.91 0.633
7 4.32 0.900
8 4.07 0.800
9 2.79 0.800
10 2.56 0.733
11 0.98 0.767
12 0.19 0.700
13 0.02 0.700
14 0.00 0.667
15 0.10 0.633
16 0.05 0.633
17 0.01 0.633
18 0.01 0.600
19 0.03 0.600
20 0.02 0.567
21 0.00 0.533
22 0.10 0.533
23 0.03 0.500
24 0.00 0.533
25 0.00 0.533
26 0.02 0.533
27 0.06 0.533
28 0.00 0.567
29 0.00 0.633
30 0.00 0.633
============================= Results summary =============================
Label ROC AUC
mnpi 0.900
Best ROC AUC 0.900
Baseline 0.800