How to approach NER "sub-entities" task?

I've trained the NER pipeline in my spaCy workflow to identify a new entity label—DRUG—very well. The next task is to identify logically "downstream" related entities, like DOSAGE. What I mean is, for my purposes, this latter entity requires a DRUG entity (but also, not at all DRUG entities have a DOSAGE entity).

"I prescribed the patient Lipitor (ent: DRUG) 20 mg (sub_ent: DOSAGE) daily."
"Patient is allergic to Acetaminophen (ent: DRUG)."
"Systolic blood pressure of 132 mmHg (ent: TEST)

What's the best way to structure the pipeline to accomplish this task? My intuition is to create a new pipeline component and use the rule-based matcher to find nearby numeric values and measurements for drugs identified in the NER pipe upstream.

Alternatively, I could use a flat NER scheme that treats DOSAGE as a first-class entity likeDRUG (but this seems messier, more prone to false positives).

Another idea is to have a latter pipeline component that, instead of rule-based matcher, is itself an NER component from a base spaCy language model, and it, too, only works on those entities labeled DRUG by the upstream NER component.

Thanks for the help!

You could also train a custom parser like in this example. So you keep DRUG as an entity but DOSAGE is a relation to that entity. This example might inspire you as well.

If you find those approaches relevant then I suggest you follow this thread that I just started on the subject.

1 Like

Hi Max,

First, a terminological tip: you'll probably find it easier to find information and papers about this if you're looking for terms like relation extraction, information extraction, or slot-filling.

I think using the rule-based matcher will make sense, and is likely to be the easiest overall. Of the other approaches, I guess making it a "first class" entity and using rules to constrain invalid outputs would be possible too.

You should consider annotating some evaluation data as the most generally useful step, since it'll help no matter which way you end up going.

1 Like