Get the start and end position of found named entities

oliverbj · May 19, 2020, 4:22pm

Hi there

I am very new to ML and also Spacy in general. I am trying to show Named Entities from an input text.

This is my method:

def run():

    nlp = spacy.load('en_core_web_sm')
    sentence = "Hi my name is Oliver!"
    doc = nlp(sentence)

    #Threshold for the confidence socres.
    threshold = 0.2
    beams = nlp.entity.beam_parse(
        [doc], beam_width=16, beam_density=0.0001)

    entity_scores = defaultdict(float)
    for beam in beams:
        for score, ents in nlp.entity.moves.get_beam_parses(beam):
            for start, end, label in ents:
                entity_scores[(start, end, label)] += score
   
    #Create a dict to store output.
    ners = defaultdict(list)
    ners['text'] = str(sentence)

    for key in entity_scores:
        start, end, label = key
        score = entity_scores[key]
        if (score > threshold):
            ners['extractions'].append({
                "label": str(label),
                "text": str(doc[start:end]),
                "confidence": round(score, 2)
            })

    pprint(ners)

The above method works fine, and will print something like:

'extractions': [{'confidence': 1.0,
                'label': 'PERSON',
                'text': 'Oliver'}],
'text': 'Hi my name is Oliver'})

So far so good. Now I am trying to get the actual position of the found named entity. In this case "Oliver".

Looking at the documentation, there is: ent.start_char, ent.end_char available, but if I use that:

"start_position": doc.start_char,
"end_position": doc.end_char

I get the following error:

AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'start_char'

Can someone guide me in the right direction?

oliverbj · May 19, 2020, 4:36pm

So I actually found an answer right after posting this question (typical).

I found that I didn't need to save the information into entity_scores, but instead just iterate over the actual found entities ent:

I ended up adding for ent in doc.ents: instead and this gives me access to all the standard Spacy attributes. See below:

ners = defaultdict(list)
ners['text'] = str(sentence)
for beam in beams:
    for score, ents in nlp.entity.moves.get_beam_parses(beam):
        for ent in doc.ents:
            if (score > threshold):
                ners['extractions'].append({
                    "label": str(ent.label_),
                    "text": str(ent.text),
                    "confidence": round(score, 2),
                    "start_position": ent.start_char,
                    "end_position": ent.end_char

My entire method ends up looking like this:

def run():
    nlp = spacy.load('en_core_web_sm')
    sentence = "Hi my name is Oliver!"
    doc = nlp(sentence)

    threshold = 0.2
    beams = nlp.entity.beam_parse(
        [doc], beam_width=16, beam_density=0.0001)

    ners = defaultdict(list)
    ners['text'] = str(sentence)
    for beam in beams:
        for score, ents in nlp.entity.moves.get_beam_parses(beam):
            for ent in doc.ents:
                if (score > threshold):
                    ners['extractions'].append({
                        "label": str(ent.label_),
                        "text": str(ent.text),
                        "confidence": round(score, 2),
                        "start_position": ent.start_char,
                        "end_position": ent.end_char
                    })

ines · May 20, 2020, 12:32pm

Hi! This is a forum dedicated to our annotation tool Prodi.gy. While the discussion often touches on spaCy, as spaCy support is built into Prodigy, it's not the right place for general usage questions around spaCy.

Stack Overflow is a better fit, and I see you've already posted your question and solution there https://stackoverflow.com/questions/61895995/get-the-start-and-end-position-of-found-named-entities

Topic		Replies	Views
spaCy get confidence on entity classification? ner , spacy	2	3029	March 19, 2019
Extract consecutive entities into Dataframe ner , spacy , off-topic	2	531	November 11, 2020
Extracting numeric token for several entities in order using Spacy usage , spacy , off-topic	0	717	October 1, 2020
merging a data annotated by regex with the annotated data by prodigy usage , ner , spacy	1	482	August 7, 2019
Custom NER model usage , ner , spacy	6	1403	April 15, 2019

Get the start and end position of found named entities

Related topics