How to extract dev set from prodigy train recipe

I am using the nightly version 1.11.0a8.

I use prodigy train recipe for training an NER model using a small dataset. So I have used -es flag to split train and evaluation sets.

Now, I want to visualize and ab-test the model-best on the dev set.

However, I don't know where to find or how to export the randomly split dataset that was used by prodigy train recipe.

Is it correct to assume that if I used the same ratio value for -es flag, and then use same ratio with data-to-spacy recipe to export the same data then I would get the exact same dev set?

In other words, do prodigy train and data-to-spacy use the same seed for randomly splitting the prodigy dataset? If not, what workflow would you recommend for training with prodigy datasets?

Thank you.

Yes, both recipes call into the same function, and the data they produce should be identical. The train recipe is mostly intended for quick experiments and it doesn't export the dataset – but if you want to export the data first and then train, you can run data-to-spacy and then train with spaCy directly, using the previously exported data.

1 Like

Thank you for confirming.