How to approach NER "sub-entities" task?

mefrem · October 19, 2020, 6:49pm

I've trained the NER pipeline in my spaCy workflow to identify a new entity label—DRUG—very well. The next task is to identify logically "downstream" related entities, like DOSAGE. What I mean is, for my purposes, this latter entity requires a DRUG entity (but also, not at all DRUG entities have a DOSAGE entity).

Examples:
"I prescribed the patient Lipitor (ent: DRUG) 20 mg (sub_ent: DOSAGE) daily."
"Patient is allergic to Acetaminophen (ent: DRUG)."
"Systolic blood pressure of 132 mmHg (ent: TEST)

What's the best way to structure the pipeline to accomplish this task? My intuition is to create a new pipeline component and use the rule-based matcher to find nearby numeric values and measurements for drugs identified in the NER pipe upstream.

Alternatively, I could use a flat NER scheme that treats DOSAGE as a first-class entity likeDRUG (but this seems messier, more prone to false positives).

Another idea is to have a latter pipeline component that, instead of rule-based matcher, is itself an NER component from a base spaCy language model, and it, too, only works on those entities labeled DRUG by the upstream NER component.

Thanks for the help!

nix411 · October 20, 2020, 7:37am

You could also train a custom parser like in this example. So you keep DRUG as an entity but DOSAGE is a relation to that entity. This example might inspire you as well.

If you find those approaches relevant then I suggest you follow this thread that I just started on the subject.

honnibal · October 23, 2020, 1:35pm

Hi Max,

First, a terminological tip: you'll probably find it easier to find information and papers about this if you're looking for terms like relation extraction, information extraction, or slot-filling.

I think using the rule-based matcher will make sense, and is likely to be the easiest overall. Of the other approaches, I guess making it a "first class" entity and using rules to constrain invalid outputs would be possible too.

You should consider annotating some evaluation data as the most generally useful step, since it'll help no matter which way you end up going.

Topic		Replies	Views
Adding a custom NER to a pipeline overrides an original NER usage , ner , spacy	5	4191	September 24, 2018
Synonyms usage , ner , medical	1	886	August 13, 2019
Improving on spacy's existing NER entities ner	1	662	December 5, 2019
Location of rule based component in the pipeline usage , ner , spacy	2	257	August 11, 2021
NER + Dependency Parsing usage , ner , spacy , dep	1	847	December 14, 2020

How to approach NER "sub-entities" task?

Related topics