How are evaluation dataset used

I'm not sure how the evaluation dataset is used. I want to "freeze" a dataset that my models are never allowed to see when training and developing/tuning. Does eval: qualify for that?

See my related question in spacy discussions.

Yes, the eval: prefix lets you specify a dataset used only for evaluation and to calculate the accuracy scores. If you're serious about training, you typically want to use a fixed dataset here that never changes so you can meaningfully compare your results across experiments.

(Of course, you still need to make sure that no training examples are present in your evaluation data. One way to double-check this is to export your data with data-to-spacy and using spaCy's debug data, which will tell you if you ended up with duplicates.)


But currently eval is actually used to determine when to stop training if setting patience, right? So the eval data is implicitly being used in the training, That might be okay but is it correctly understood?

The evaluation data is used to calculate the accuracy by comparing the predictions to the unseen examples and correct answers in the evaluation data. The accuracy is then used to determine when to stop training in the default configuration, if the accuracy stops improving.

1 Like