What happens if your annotation has overlapping entity spans?

wpm · March 2, 2018, 5:16pm

When I fix the catastrophic forgetting problem by adding in entities detected by the baseline model do I have to be careful not to have the new entity spans and the old entity spans overlap?

For example, say I am trying to build an NER model that finds sports teams. I have the following sentence.

The Florida Gators won their away game in California last night.

Out of the box, spaCy will annotate “Florida” and “California” as GPEs. What I ultimately want is to keep “California” as a GPE, but label “Florida Gators” as a SPORTS_TEAM.

In my training data I’ll label “Florida Gators” as SPORTS_TEAM, but then in order to combat catastrophic forgetting I’ll run the sentence through the baseline NER, have it tell me that “California” is a GPE, and add that span to my training data. The baseline NER will also tell me that “Florida” is a GPE, and I don’t want to have that information overwrite my SPORTS_TEAM annotation.

Is there some convention that spAcy/Prodigy uses to keep this straight, or do I just have to be careful not to overlap spans when I’m augmenting my training data?

honnibal · March 2, 2018, 7:55pm

The entity recognizer is constrained to predict only non-overlapping, non-nested spans. The training data should obey the same constraint. If you like, you could have two sentences with the different annotations in your data. I’m not sure whether this would hurt or help your performance, though.

If you want spaCy to learn to recover both annotations, you could have two EntityRecognizer instances in the pipeline. You would need to move the entity annotations into an extension attribute, because you don’t want the second entity recogniser to overwrite the entities set by the first one. Something like this should work:


from spacy.tokens import Doc

Doc.set_extension('my_ents', default=None)

def move_ents_to_attr(doc):
    if doc._.my_ents is None:
        doc._.my_ents = []
    doc._.my_ents.extend(doc.ents)
    doc.ents = []
    return doc

nlp = spacy.load('en')
nlp.entity.postprocess.append(move_ents_to_attr)

sergei3000 · December 24, 2019, 12:04pm

What if I have more than two NER models? Do I need to create an extension attribute for each of them? Or should I code it differently to handle unknown number of models in my NER pipeline?

honnibal · December 25, 2019, 11:50am

You would need to have an extension attribute to hold the spans, yes. Internally spaCy encodes the entity annotations using IOB-style data, so there's no way to represent overlapping entities on the built-in token data.

mkallen · March 23, 2020, 8:21pm

Tried to implement the code above but got the following error

AttributeError: 'spacy.pipeline.pipes.EntityRecognizer' object has no attribute 'postprocess'

I cant seem to find anything on postprocess in the spacy docs

honnibal · March 29, 2020, 9:00am

@mkallen My mistake, sorry. Just add the function to the pipeline with nlp.add_pipe(move_ents_to_attr, last=True)

mkallen · March 30, 2020, 3:09pm

Thank you

adsk2050 · January 10, 2024, 1:11pm

I have read that Spans can have overlapping entities:

Unlike in doc.ents, overlapping matches are allowed in doc.spans, 
so no filtering is required, but optional filtering and sorting can be applied 
to the spans before they’re saved.

Can I use this to create training data for my spacy model?
If yes, how? Because DocBin doesnt accept overlapping spans.

magdaaniol · January 12, 2024, 12:45pm

Hi @adsk2050 and welcome to the forum!

This note in Prodigy span categorization docs contains just the info you need, I think.
You should consider training spaCy SpanCategorizer
Did data-to-spacy with --spancat dataset did not work for you?

Topic		Replies	Views
ner.batch-train ERROR: Trying to set conflicting doc.ents usage , spacy , solved	8	3162	November 12, 2019
Training Data after Using spans.manual usage , done , spacy , spancat	20	843	August 21, 2021
ValueError: A Token can only be part of one entity [...] usage , ner	4	3458	July 28, 2020
Overlapping Entities ner , solved	2	883	August 20, 2023
New entity model ruins other entities ner , solved , best-practices	9	3891	August 16, 2018

What happens if your annotation has overlapping entity spans?

Related topics