Synonyms

aph61 · August 10, 2019, 2:25pm

Hi,

I've a small data set of 10 data points on "Lipitor 5mg" (same as Atorvastatin, but branded), 5 data points on "Atorvastatin 10mg", and 5 on "Atorvastatin 20mg". When I do a regular term.teach they are all identified as drug, but not as a statin since I have too few data points.

{"label": "DRUG", "patterns": [{lower: "Lipitor"}, {"lower": 5mg}]}
{"label": "DRUG", "patterns": [{lower: "Atorvastatin"}, {"lower": 10mg}]}
{"label": "DRUG", "patterns": [{lower: "Atorvastatin"}, {"lower": 20mg}]}

I like to map all drugs to the single entry Atorvastatin in the vocabulary that is used further in the analysis. I tried

{"name": "Atorvastatin", "label": "DRUG", "patterns": 
[
[{lower: "Lipitor"}, {"lower": 5mg}],
[{lower: "Atorvastatin"}, {"lower": 10mg}],
[{lower: "Atorvastatin"}, {"lower": 20mg}],
]}

but I don't know if I get what I need. I could not find a smart way to verify it. In the ideal case I have a sentence where I have only the drug Atorvastatin, nothing else

suggestions?

thanks

Andreas

ines · August 13, 2019, 9:51am

I hope I understand the question correctly. But I think you probably want to focus on training your model to recognise DRUG (any drug) until it's reasonably good at it. You can then add a rule-based component on top later that normalises them and groups them into subtypes. I actually outlined a very similar approach here:

Topic		Replies	Views
Nested labels for NER usage , ner , best-practices , medical	15	5448	January 27, 2020
How to approach NER "sub-entities" task? usage , ner , spacy	2	913	October 23, 2020
Train Snomed medical concepts usage , ner , spacy , medical	4	1525	April 11, 2019
NER Standardized Name Output usage , ner , hr	2	1716	August 11, 2018
Prodigy + spaCy for negation extraction and a link between the entities usage , ner , dep , best-practices , medical	2	2539	July 23, 2018

Synonyms

Related topics