Can I replicate "prodigy train --ner ds_<dataset_name> ./models --eval-split 0.25 -L" within Python?

hi @wertzhayden!

Not sure what you mean by "recreate" -- do you just mean you want to look at the raw Python code for prodigy train? You can view all the built-in recipes by running prodigy stats, looking for Location:, then finding the recipes folder. train.py includes prodigy train.

As you'll see, prodigy train is just a wrapper for spacy train. The issue you may get is that prodigy train will redo the partitioning when doing --eval-split, which isn't ideal -- that is, each time you run prodigy train, you'll get a different result. That's why we typically only recommend prodigy train for quick-and-simple training but suggest using data-to-spacy then spacy train for more sophisticated workflows.

There are a lot of support issues that discuss this more. For example, this one I use an example showing how to replicate prodigy train with spacy train using a sample project: