Validation within Prodigy (Cross Validation)

NadineB · October 30, 2019, 2:02pm

Hello Prodigy-Team,

I am currently training a binary classification model. I used 20% as Validation Data and 80% as Traindata (eval-split 0.2) The evaluation result already looks not bad. Now I want to ask whether a cross validation is possible within Prodigy - so that Prodigy do not use always the same data when I set the eval-split to 20% ?

To achieve a model which is even better, I than first looked at: With which eval-split I can receive the best model. Do you think that this approach makes sense?
After determining the best split I than want to play a little bit with the paramters "batch-size" and "n-iter" to further improve the model. After having the perfect model I than want to export the model and test it in a python environment on new datasets (datasets which prodigy havent seen, to see whether the model is overfitted). Does this makes sense in your eyes?
By testing the exported model on the new data I have to set a threshold-score, so that the program knows at what score a dataset should be considered relevant. How do I determine such a threshold? Respectively, how does prodigy set such a threshold within the validation?

Thanks in advance!!!
Best regards
Nadine

honnibal · November 3, 2019, 2:33pm

Hi @NadineB,

We don't have cross-validation as a default recipe, as we usually find it's less useful than keeping a stable evaluation set. You can always split up the data yourself if you need to run it.

I do see a problem there. If you're looking at the different splits, you're changing both the training and evaluation. So you could as easily just be searching for which split happens to have the easiest examples in its evaluation.

It can make sense to tune the batch size and number of iterations. But you should take care when doing this on a small dataset: there's a lot of random variation, so you might not come to a reliable improvement --- you might just happen to improve on the few examples you're evaluating against.

Typically you would adjust the threshold based on whether you care more about false positives, or false negatives. If you care about them equally, a threshold of 0.5 seems fine.

NadineB · November 4, 2019, 2:34pm

Thanks for the response!

Can you explain me how I can split up the data myself? A quick example, how it works, would be fantastic!

Makes sense! But Prodigy also doesn't know whether I care more about False Positives or False Negatives. So Prodigy uses a threshold of 0.5 within the validation on the validationset?

Thanks in advance!

NadineB · November 6, 2019, 8:44am

And I've got one more question @honnibal ! After making the cross validation where I found out what the best number of epochs and the best batch-size is, I want to train a model on the whole datset with the identified best parametre-values. So here I have to use 0% as eval-split so that the model uses all data for the training. But due to there are no evaluation data, how can prodigy choose the "best model"? Or do prodigy just takes the model from the last epoch?

Thanks in advance !!

honnibal · November 7, 2019, 11:55am

You might find the data splitting functions in scikit-learn helpful: sklearn.model_selection.KFold — scikit-learn 1.3.2 documentation . They also have a lot of other utilities that might help your experiments.

Yes, that's correct.

Some people do this process of retraining on the whole dataset, so there are definitely people who'll advocate for that workflow. I'm in the other camp: I think it's really not a good idea, for the reason you mentioned. Without development data there's no way to choose between different models. You're also really vulnerable to something going wrong. Neural networks are a bit random, especially on small datasets: sometimes you get an unlucky initialisation or data order, and the model doesn't converge to a good solution. With no development data, you're running blind. You could get unlucky and your accuracy could have cratered on your last training, the one you're about to ship to production, and you'd never know.

So I say: just don't do that.

Topic		Replies	Views
How to declare and use validation set in ner.train usage , training	3	479	January 12, 2022
how does prodigy data-to-spacy --eval-split do the split? usage	2	273	May 16, 2022
Train eval split usage	1	617	March 25, 2019
Post-labeling train/test split enhancement , solved	3	879	December 29, 2017
Handling train / dev / test in Prodigy usage , ner , training	3	580	July 22, 2021

Validation within Prodigy (Cross Validation)

Related topics