Prodigy ner.batch-train vs Spacy train

ines · June 17, 2019, 9:46am

Hi – this is a totally valid question

Since Prodigy focuses a lot on usage as a developer tool, the built-in batch-train commands were also designed with the development aspect in mind. They’re optimised to train from Prodigy-style annotations and smaller datasets, include more complex logic to handle evaluation sets and output more detailed training statistics.

Prodigy’s ner.batch-train workflow also supports training from “incomplete” annotations out-of-the-box, e.g. a selection of examples biased by the score, and binary decisions collected with recipes like ner.teach. There’s not really and easy way to train from the sparse data formats created with the active learning workflow using spaCy – at least not out-of-the-box.

spaCy’s spacy train command on the other hand was designed for training from larger corpora, often annotated for several components (named entities, part-of-speech tags, dependencies etc.). It also supports more configuration options and settings to tune hyperparameters.

TL;DR: If you want to run quick experiments, train from binary annotations, or export prototype models from your Prodigy annotations, use the batch-train recipes. If you want to train your production model on a large corpus on annotations, use spacy train.

Topic		Replies	Views
ner.batch_train vs spacy nlp.begin_training ner , spacy	1	1098	January 26, 2018
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1207	July 2, 2020
accuracy same for both prodigy train and spacy train usage , spacy , solved	4	785	January 19, 2020
Remarkable Difference Between Prodigy and Custom Training Times ner	5	1439	April 4, 2018
Difference between training pre-annotated data using SpaCy and Prodigy usage , ner , spacy	1	779	March 19, 2019

Prodigy ner.batch-train vs Spacy train

Related topics