I use prodigy to annotate data but I generally use spacy to train the model. What confuses me is that when I run training-curve on my dataset, the model actually reaches a higher score, than when I train the model (with the same dataset), despite using the same config (I'm providing a base-model to training-curve).
What's behind this? Is this expected?
Any pointers would be much appreciated!
PS: training-curve is very helpful, thank you for includin a recipe for this!
Hi! That's definitely strange Under the hood, the train-curve recipe just calls into the train recipe and does exactly the same thing, especially in the current nightly. How different are your scores and are you comparing the final score from train and train-curve? And could you share the two outputs, one from the train-curve and one from train?
If the better score in train-curve is not the final score but from one of the runs with less data (25%, 50%, 75%), this would indicate that your model is performing worse with more data and could point to a problem with the data, e.g. inconsistencies in the annotation.
I did compare the final score from train-curve with the score from spacy train. Unfortunately I don't have the outputs anymore and I can't reproduce it either. My dataset has grown since I last ran train-curve and now this discrepancy doesn't seem to exist anymore.
Do you remember how large the difference was? Was it a few percent, or more like 0.1%? I think the train-curve might round to one less digit than the train output, so it might have been just that?
Anyway, definitely keep us updated in case this comes up again!
It was a score of 0.90 vs 0.91. I think the 0.90 was rounded from 89 something, so I don't think it was just a rounding error.
I'll definitely update this thread if it happens again. Thank you!