I've trained the NER pipeline in my spaCy workflow to identify a new entity label—
DRUG—very well. The next task is to identify logically "downstream" related entities, like
DOSAGE. What I mean is, for my purposes, this latter entity requires a
DRUG entity (but also, not at all
DRUG entities have a
"I prescribed the patient Lipitor (ent:
DRUG) 20 mg (sub_ent:
"Patient is allergic to Acetaminophen (ent:
"Systolic blood pressure of 132 mmHg (ent:
What's the best way to structure the pipeline to accomplish this task? My intuition is to create a new pipeline component and use the rule-based matcher to find nearby numeric values and measurements for drugs identified in the NER pipe upstream.
Alternatively, I could use a flat NER scheme that treats
DOSAGE as a first-class entity like
DRUG (but this seems messier, more prone to false positives).
Another idea is to have a latter pipeline component that, instead of rule-based matcher, is itself an NER component from a base spaCy language model, and it, too, only works on those entities labeled DRUG by the upstream NER component.
Thanks for the help!