Hi – this is a totally valid question
Since Prodigy focuses a lot on usage as a developer tool, the built-in batch-train
commands were also designed with the development aspect in mind. They’re optimised to train from Prodigy-style annotations and smaller datasets, include more complex logic to handle evaluation sets and output more detailed training statistics.
Prodigy’s ner.batch-train
workflow also supports training from “incomplete” annotations out-of-the-box, e.g. a selection of examples biased by the score, and binary decisions collected with recipes like ner.teach
. There’s not really and easy way to train from the sparse data formats created with the active learning workflow using spaCy – at least not out-of-the-box.
spaCy’s spacy train
command on the other hand was designed for training from larger corpora, often annotated for several components (named entities, part-of-speech tags, dependencies etc.). It also supports more configuration options and settings to tune hyperparameters.
TL;DR: If you want to run quick experiments, train from binary annotations, or export prototype models from your Prodigy annotations, use the batch-train
recipes. If you want to train your production model on a large corpus on annotations, use spacy train
.