Hi everyone, its great to be part of the Prodigy community!! I have only started learning to use spaCy few wks ago, and exposed myself to Prodigy only yesterday. Hope to learn and grow with everyone’s opinions and advice
I’m currently working on domain-specific project, which deals with entity recognition in the healthcare sector. As such, any pre-trained models that are trained on medical texts are extremely useful and will expedite any form of entity recognition (eg. Cause of Illness, Treatment, Diagnosis, Duration, Impression, etc).
- Are there any recommendation for open source pre-trained models that might be useful? I have seen open source models such as medaCy, PubMed, or how about models in spaCy’s universe (such as Kindred, saber, scispaCy)?
As mentioned, I will be working on NER extraction from text files consisting of doctors’ prescription/diagnostic notes. I am currently focusing on NE “Causes of Illness” (for eg. if someone has a fever, the cause of illness could be that he had a throat inflammation, or he was bitten by an disease-carrying bug, or simply, there could also be no known reason.
Here are my 3 proposals to approach this:
A) Thinking on a superficial level, I could just perform terms.teach on known “Causes of Illness” BUT not using en_core_web_lg model. Instead, I will use the pre-trained model trained on healthcare to get a better word embedding and better similarity comparison. Is ner.match useful here too?
B) I could use possibly pattern matching (but not too sure how to) to identify prefixes for “Causes of Illness”. For eg. I could find phrases such as “because”, “due to”, “as a result of”, etc. How can I then teach the model to focus more on the texts succeeding these phrases for tagging of “Causes of Illness”?
C) Could I associate 2 labels together (for eg. the presence of label “ILLNESS” might increase likelihood of the presence of “Causes of Illness”). Can the dependency algo be used here (dep.teach, etc)?
If anyone has any suggestion, please fire them away… Thanks for taking the time to read this. Any advice is appreciated.