Better score with learning-curve than with train?

lnatprodigy · July 16, 2021, 1:40pm

Hi,

I use prodigy to annotate data but I generally use spacy to train the model. What confuses me is that when I run training-curve on my dataset, the model actually reaches a higher score, than when I train the model (with the same dataset), despite using the same config (I'm providing a base-model to training-curve).

What's behind this? Is this expected?

Any pointers would be much appreciated!

PS: training-curve is very helpful, thank you for includin a recipe for this!

ines · July 18, 2021, 2:54am

Hi! That's definitely strange Under the hood, the train-curve recipe just calls into the train recipe and does exactly the same thing, especially in the current nightly. How different are your scores and are you comparing the final score from train and train-curve? And could you share the two outputs, one from the train-curve and one from train?

If the better score in train-curve is not the final score but from one of the runs with less data (25%, 50%, 75%), this would indicate that your model is performing worse with more data and could point to a problem with the data, e.g. inconsistencies in the annotation.

lnatprodigy · July 18, 2021, 4:40am

Hi,

I did compare the final score from train-curve with the score from spacy train. Unfortunately I don't have the outputs anymore and I can't reproduce it either. My dataset has grown since I last ran train-curve and now this discrepancy doesn't seem to exist anymore.

ines · July 18, 2021, 5:42am

Do you remember how large the difference was? Was it a few percent, or more like 0.1%? I think the train-curve might round to one less digit than the train output, so it might have been just that?

Anyway, definitely keep us updated in case this comes up again!

lnatprodigy · July 18, 2021, 7:02am

It was a score of 0.90 vs 0.91. I think the 0.90 was rounded from 89 something, so I don't think it was just a rounding error.
I'll definitely update this thread if it happens again. Thank you!

Topic		Replies	Views
Train curve accuracy getting worse usage , ner	5	1040	November 9, 2018
Evaluation data for ner model ner	2	379	October 11, 2023
prodigy train result is different with the spacy train result, why? usage , ner , spacy , solved	7	755	February 3, 2023
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1209	July 2, 2020
what to do if train-curve shows slight decrease in last sample usage , best-practices , training	6	1109	June 8, 2022

Better score with learning-curve than with train?

Related topics