what to do if train-curve shows slight decrease in last sample

dave-espinosa · June 7, 2022, 5:56pm

I was checking this thread, because it's exactly my own case at the moment. What I am doing is training a custom NER model, which recognizes 4 different labels (there are a bit more details here). Following Ines Montani introductory video, I started off with 500 texts (80-20 split), just to see how things are going, and I got this result. As you can see, train-curve suggested to "add more labeled samples to training set", which I did: by adding 500 samples more (summing up a total of 1000 samples; 80-20 split), I got the following plot:

Comparing those results with the first test, 2 quick observations can be obtained:

The metric increase was not much significant.
Again, adding more samples could improve the model's perfomance.

This time, I decided to increase the amount of added samples a bit more: 4000 labeled texts were used this time (80-20 split), to obtain the following results:

Sadly this time, the performance worsened.

With these results, I have the following questions:

What is the metric obtained in train-curve? (accuracy, precision, F1...)
Intuitively, it seems that this model is overfitting ; however if less samples are used, I see that the "metric" never surpasses "0.6" (which it is a bit worrying, as Ines in the video mentioned before, obtained a metric of > "0.7" in her first try). Here you mentioned "it's worth reconsidering the annotation guidelines for your approach"; where are those annotation guidelines located?

Sorry if they are too many questions, please let me know If I should open different cases for each.

Thank you!

Topic		Replies	Views
Train curve accuracy getting worse usage , ner	5	1050	November 9, 2018
Better score with learning-curve than with train? usage , ner , nightly	4	625	July 18, 2021
How much training data for multiclass/multilabel text classification?	3	1056	November 1, 2022
feature request : more details on train-curve recipe results usage , solved	2	439	May 19, 2020
Annotation score drops	21	508	May 18, 2023

what to do if train-curve shows slight decrease in last sample

Related topics