Nested labels for NER

ines · August 27, 2018, 11:20am

One thing you could do is add a second pipeline component after the entity recognizer that looks for DRUG entities and then sets a custom attribute on those entities specifying a list of subtypes (e.g. ent._.entity_subtypes).

Here’s an example of a rule-based solution – but you could obviously also swap out the dictionary for a statistical solution, or a combination.

from spacy.tokens import Span

# dictionary of lowercase entities mapped to subtypes
DRUG_SUBTYPES = {
    'citalopram': ['ANTIDEPRESSANT', 'SOMETHING_ELSE'],
    'lexapro': ['ANTIDEPRESSANT'],
    # etc.
}

# register global span._.entity_subtype extension
Span.set_extension('entity_subtypes', default=None)

def assign_subtypes(doc):
    # this function will be added after the NER in the pipeline
    for ent in doc.ents:
        if ent.label_ == 'DRUG':
            # look up entity text and set custom attribute
            ent._.entity_subtypes = DRUG_SUBTYPES.get(ent.text)
    return doc

You could then use the component like this:

nlp = spacy.load('/path/to/your/drugs/model')
nlp.add_pipe(assign_subtypes, after='ner')

Topic		Replies	Views
How to approach NER "sub-entities" task? usage , ner , spacy	2	913	October 23, 2020
sublabel for NER usage , ner , medical	7	741	January 4, 2023
Trying to teach NER from blank model for Russian language ner , spacy , solved	3	3200	August 8, 2018
NER for short unstructured text, what am I doing wrong? ner	12	1377	November 27, 2018
Annotate for NER and classification at the same time ner , best-practices	1	525	October 19, 2021

Nested labels for NER

Related topics