I've trained the NER pipeline in my spaCy workflow to identify a new entity label—DRUG—very well. The next task is to identify logically "downstream" related entities, like DOSAGE. What I mean is, for my purposes, this latter entity requires a DRUG entity (but also, not at all DRUG entities have a DOSAGE entity).
Examples:
"I prescribed the patient Lipitor (ent: DRUG) 20 mg (sub_ent: DOSAGE) daily."
"Patient is allergic to Acetaminophen (ent: DRUG)."
"Systolic blood pressure of 132 mmHg (ent: TEST)
What's the best way to structure the pipeline to accomplish this task? My intuition is to create a new pipeline component and use the rule-based matcher to find nearby numeric values and measurements for drugs identified in the NER pipe upstream.
Alternatively, I could use a flat NER scheme that treats DOSAGE as a first-class entity likeDRUG (but this seems messier, more prone to false positives).
Another idea is to have a latter pipeline component that, instead of rule-based matcher, is itself an NER component from a base spaCy language model, and it, too, only works on those entities labeled DRUG by the upstream NER component.
Thanks for the help!