Demographics Entity Extraction from clinical trail eligibility criteria

HI
I have recently joined the prodigy community. At the moment I've started working on a project to extract demographics data, medical conditions, and drugs from clinical trial eligibility criteria to develop a semantic relationship graph. I was able to extract medical conditions, drugs, and there attributes using the Scispacy model. However, Now I also want to extract demographics data, could you please suggest me some available pre-trained model or ways to extract them.
In addition to that, I've also wanted to categorize the disease type based on attributes such as level (mild, moderate, and severe) and duration (chronic or short term), etc. It will be helpful if anyone can guide me.
Thanks in Advance!

Hi! It's difficult to give a definitive answer because the approach that works best will depend on your data, how you break down the demographics you want to extract into categories etc. Maybe in this case, you want to experiment with doing some manual annotation first (maybe with patterns to help you and pre-select entities for you), and then train a separate entity recognizer. The usage guide on NER should be a good place to start:

If you haven't seen it already, also check out the medical tag on the forum for discussion related to training models for biomedical use cases: Topics tagged medical

Also, this is a recent project published by researchers at Oxford, and it's built on top of spaCy and trained on data annotated with Prodigy. They published a detailed blog post and a paper the approaches they chose and the different considerations. So if you haven't seen this yet, it's definitely an interesting read and should be pretty relevant to you.

@ines Thanks for your prompt reply. I'll try and get back to you.