accuracy same for both prodigy train and spacy train

mystuff · January 8, 2020, 3:00pm

Hello,
I have more than 30k docs with lables and want to train for NER only. I ran both prodigy train and spacy train commands with --pipeline ner and rest of the options used default.

The accuracy is same in both ways. Why do i need to use spacy over prodigy.
i read this thread Prodigy ner.batch-train vs Spacy train. How can i tune Hyper parameters to improve accuracy of spacy train command.

ines · January 9, 2020, 11:11am

spaCy is the library used unter the hood, so it makes sense to train with spaCy directly. Prodigy's training commands call into spaCy and do some Prodigy-specific stuff, like loading the datasets, merging examples, validation etc. So it's convenient to use for quick experiments. But you don't have to train with it. The fact that you get the same results when training with spaCy directly or via Prodigy's wrappper is a good thing – that's how it should be.

You can find some more details on the different approaches here: https://prodi.gy/docs/named-entity-recognition#training
Here are the docs for spaCy's train command: https://spacy.io/api/cli#train

mystuff · January 16, 2020, 5:01pm

Thanks for the clarification. I am using spacy model in my staging environment. Its pretty cool and 90% predictions are promising. But I found out that sometimes model missing few entities due to the labelling issue. So I am in a position to correct some of the 30k docs. What’s the best way?. I have 30k data in text, spans format. Can I load back that 30k and do manual labelling?

ines · January 17, 2020, 11:51am

Yes, you can always export a dataset to a JSONL file and load it back in as the input data (e.g. in ner.manual). You'll then see the annotations and can correct them if needed.

If you have more raw unannotated text, another thing you could do is use ner.correct with your pretrained model. This should make it pretty easy to collect more data, because the model's predictions will be pre-selected and you only have to correct the mistakes. See here for an example: Named Entity Recognition · Prodigy · An annotation tool for AI, Machine Learning & NLP

mystuff · January 19, 2020, 10:52am

Thank you so much. I will do correct using manual and retrain model to improve accuracy.

Topic		Replies	Views
Prodigy ner.batch-train vs Spacy train usage , spacy , best-practices	13	3496	June 2, 2020
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1209	July 2, 2020
ner.batch_train vs spacy nlp.begin_training ner , spacy	1	1098	January 26, 2018
Prodigy model not learning, spaCy model ~90% F1 score usage , ner , spacy	11	1829	May 21, 2019
prodigy ner blank vs vectors model usage , ner , spacy , solved	8	870	May 13, 2020

accuracy same for both prodigy train and spacy train

Related topics