I have more than 30k docs with lables and want to train for NER only. I ran both prodigy train and spacy train commands with --pipeline ner and rest of the options used default.
The accuracy is same in both ways. Why do i need to use spacy over prodigy.
i read this thread Prodigy ner.batch-train vs Spacy train. How can i tune Hyper parameters to improve accuracy of spacy train command.
spaCy is the library used unter the hood, so it makes sense to train with spaCy directly. Prodigy's training commands call into spaCy and do some Prodigy-specific stuff, like loading the datasets, merging examples, validation etc. So it's convenient to use for quick experiments. But you don't have to train with it. The fact that you get the same results when training with spaCy directly or via Prodigy's wrappper is a good thing – that's how it should be.
You can find some more details on the different approaches here: https://prodi.gy/docs/named-entity-recognition#training
Here are the docs for spaCy's
train command: https://spacy.io/api/cli#train
Thanks for the clarification. I am using spacy model in my staging environment. Its pretty cool and 90% predictions are promising. But I found out that sometimes model missing few entities due to the labelling issue. So I am in a position to correct some of the 30k docs. What’s the best way?. I have 30k data in text, spans format. Can I load back that 30k and do manual labelling?
Yes, you can always export a dataset to a JSONL file and load it back in as the input data (e.g. in
ner.manual). You'll then see the annotations and can correct them if needed.
If you have more raw unannotated text, another thing you could do is use
ner.correct with your pretrained model. This should make it pretty easy to collect more data, because the model's predictions will be pre-selected and you only have to correct the mistakes. See here for an example: https://prodi.gy/docs/named-entity-recognition#manual-model
Thank you so much. I will do correct using manual and retrain model to improve accuracy.