ner.batch_train vs spacy nlp.begin_training

Since Prodigy focuses a lot on usage as a developer tool, the built-in batch-train commands were also designed with the development aspect in mind. They’re optimised to train from Prodigy-style annotations and smaller datasets, include more complex logic to handle evaluation sets and output more detailed training statistics.

Prodigy’s ner.batch-train workflow was also created under the assumption that annotations would be collected using ner.teach – e.g. a selection of examples biased by the score, and binary decisions only. There’s not really and easy way to train from the sparse data formats created with the active learning workflow using spaCy – at least not out-of-the-box.

The ner.manual is still pretty new, and we haven’t ourselves trained models entirely from annotations collected with this workflow. But there shouldn’t be a problem converting them to spaCy’s training format and we’re thinking about including a recipe with a future version of Prodigy that takes care of this. (See this thread for a discussion on the topic.)

1 Like