what to do if train-curve shows slight decrease in last sample

Hello @SofieVL

I was checking this thread, because it's exactly my own case at the moment. What I am doing is training a custom NER model, which recognizes 4 different labels (there are a bit more details here). Following Ines Montani introductory video, I started off with 500 texts (80-20 split), just to see how things are going, and I got this result. As you can see, train-curve suggested to "add more labeled samples to training set", which I did: by adding 500 samples more (summing up a total of 1000 samples; 80-20 split), I got the following plot:

Comparing those results with the first test, 2 quick observations can be obtained:

  • The metric increase was not much significant.
  • Again, adding more samples could improve the model's perfomance.

This time, I decided to increase the amount of added samples a bit more: 4000 labeled texts were used this time (80-20 split), to obtain the following results:

Sadly this time, the performance worsened.

With these results, I have the following questions:

  1. What is the metric obtained in train-curve? (accuracy, precision, F1...)
  2. Intuitively, it seems that this model is overfitting ; however if less samples are used, I see that the "metric" never surpasses "0.6" (which it is a bit worrying, as Ines in the video mentioned before, obtained a metric of > "0.7" in her first try). Here you mentioned "it's worth reconsidering the annotation guidelines for your approach"; where are those annotation guidelines located?

Sorry if they are too many questions, please let me know If I should open different cases for each.

Thank you!