Difference between training pre-annotated data using SpaCy and Prodigy

dsnlp · March 18, 2019, 8:50pm

I used the manual recipe and annotated data. I used the following commands:

prodigy ner.batch-train dummy_data en_core_web_sm --output /home/user --no- 
missing

prodigy ner.teach new_dummy_data /home/user --label TESTONE,TESTTWO

In the train command why do we still use the en_core_web_sm Spacy model? When we train based on our annotations, shouldn’t the parameter name be the model name that I saved on?
What would be the difference between training in SpaCy and Prodigy? Will the results still be the same if we use training commands in SpaCy or using ner.batch-train actually makes a difference?
After the training, in the ner.teach command, shouldn’t we pass the dataset (annotations from manual process) or the model that we trained the model on?

ines · March 19, 2019, 8:59am

When you train a model, you usually need something to start with – even if it's just a blank language class like English that has the English tokenization rules, vocab etc. Sometimes you also want to start with word vectors to improve the accuracy. And you also often want to update an existing model further. So the model argument lets you pass in the base model to start with – either an existing model, or a blank model, like spacy.blank("en").to_disk("/model/path").

Under the hood, it'll always be calling nlp.update in some way, so it's doing the same thing. Prodigy's training commands are really just custom spaCy training loops. They're slightly more optimised for quick experiments and make it easy to train from incomplete and/or binary annotations (e.g. the ones you collect in ner.teach). If you train with spacy train on the other hand, that's more optimised to train from large, gold-standard corpora.

Yes, exactly – assuming you want to improve the model you just trained in the loop. That's the second argument on the command-line, in your example: /home/user (in a real-life scenario, you probably want to be choosing a better subdirectory here).

Topic		Replies	Views
Prodigy ner.batch-train vs Spacy train usage , spacy , best-practices	13	3498	June 2, 2020
ner.batch_train vs spacy nlp.begin_training ner , spacy	1	1099	January 26, 2018
spaCy Training Model vs Prodigy usage , spacy , solved	1	370	March 21, 2022
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1209	July 2, 2020
accuracy same for both prodigy train and spacy train usage , spacy , solved	4	786	January 19, 2020

Difference between training pre-annotated data using SpaCy and Prodigy

Related topics