how does prodigy data-to-spacy --eval-split do the split?

ivan · May 13, 2022, 2:37pm

What is the strategy for creating the eval split? Is any type of stratification used?

Cheers,
Ivan

ljvmiranda921 · May 15, 2022, 11:59pm

HI @ivan , the --eval-split parameter performs a straightforward cut of the dataset based on the percentage you passed (usually 0.2). If you want a more complex split, it may be better to do it as your preprocessing step and just passed the .spacy files with the split you want.

ivan · May 16, 2022, 5:11pm

Makes sense. I have been a bit lazy relying on the random split for every new labelling campaign so will need to get that under control, it is also slightly complicated to keep a constant validation set when the training set is constantly changing with new annotations. I have to have a way of recording what is in the validation set vs what I might want to add to the validation set based on the new annotations.

Topic		Replies	Views
data-to-spacy eval-split doesn't seem to have any effect done	2	545	March 11, 2020
Handling train / dev / test in Prodigy usage , ner , training	3	580	July 22, 2021
Train eval split usage	1	617	March 25, 2019
How to evaluate the model accuracy with test data (not part of training) usage , ner , spacy	8	724	March 12, 2024
Validation within Prodigy (Cross Validation) usage , textcat	4	807	November 7, 2019

how does prodigy data-to-spacy --eval-split do the split?

Related topics