train a Spacy 'en_core_web_md' manually using ner.manual

ameyn21 · October 17, 2019, 11:06am

I tried training the Spacy's 'en_core_web_md' model to identify diseases using ner.teach and twitter, times now, guardian API's.
But the model has a very low accuracy of 40% and marks literally anything as a 'DISEASE' entity.
terms like to, from, She, comma, colon are also predicted as a 'DISEASE' entity.

Not sure why this happens. But I have a feeling that the problem might be because of the dataset having very less number of sentences having diseases and more sentences with NO diseases.

Now, I am planning to create my own dataset and I need some guidance here.

How many sentences will I need to train having diseases ?
I want to train the model manually using ner.manual. How do I do it ?

Thanks

ines · October 18, 2019, 10:00am

Hi! You might find the NER annotation flowchart helpful, which should answer your main questions and give you some inspiration for what to try:

It might make sense to start off with a blank model instead of the pre-trained NER component of the en_core_web_md model. If there are entity types you want to keep (like PERSON), you can use the ner.make-gold recipe with the existing labels plus DISEASE. This will pre-highlight the existing predictions and lets you correct them, and manually add the annotations for your new DISEASE label. When you train your model later on, make sure to set the --no-missing flag to tell spaCy that the annotated spans are complete and unannotated tokens are not part of any entity.

Topic		Replies	Views
Improve accuracy of the Spacy model ner , spacy	4	4782	October 30, 2019
Looks like a new trained model has forgotten the old entities usage , ner	1	877	October 14, 2019
Improving on spacy's existing NER entities ner	1	664	December 5, 2019
Improve trained models with annotations usage , ner , training	3	520	September 20, 2021
Training new model using annotations from ner.manual ner , spacy	2	681	June 28, 2018

train a Spacy 'en_core_web_md' manually using ner.manual

Related topics