Pre-Train Spacy NER for healthcare data

madhujahagirdar · January 27, 2018, 8:22pm

Apache c-takes does a great job of identifying NER labels focused on healthcare data. Can I take the output NER labels (pre processed) from apache c-takes and feed to spacy’s word embedding model , using some json format, and create a general representation for accurate classification problems.

http://healthnlp.github.io/examples/

honnibal · January 27, 2018, 8:51pm

If you have an initial NER system that’s doing a good job, I would suggest creating a custom recipe which uses Prodigy to mark the suggested annotation as correct or incorrect. This can help you quickly create a vetted training data set which you can use to train spaCy or another tool.

It might also be possible to use the NER model in Apache c-takes directly in Prodigy’s ner.teach recipe. However, this might be also be a bit difficult. spaCy supports a fairly sophisticated training procedure to let it train on sparse annotations, where it doesn’t know the fully correct annotation for the sentences in the training data. Achieving the same result with another model may or may not be easy.

In summary, I would suggest the following workflow:

Create a recipe that passes text through Apache c-takes. You can either ask about all the annotations on a sentence, or mark the annotations one-by-one.
Use Prodigy’s ner.batch-train command to train a new spaCy model, which will be saved to disk.
If you wish to improve accuracy further, you can use your newly trained spaCy model with the ner.teach command.

Hope that helps — let us know how you go

Topic		Replies	Views
Improve accuracy of the Spacy model ner , spacy	4	4818	October 30, 2019
Prodigy to Spacy Guide ner , spacy , best-practices	4	5350	January 13, 2020
NER for Financial Text ner	14	1744	October 25, 2023
Synthetic NER data usage , ner	1	497	February 5, 2018
Stuck training some NER models (newbie) usage , ner , best-practices	2	1040	July 16, 2020

Pre-Train Spacy NER for healthcare data

Related topics