How to Restrict Nested Spans of Same Label Type in Spancat

clinical_nlp · February 16, 2022, 7:43pm

Hi,

I've had success training a span categorizer using data annotated in prodigy. The nested spans were one reason to use it, and getting an output with 'fatty liver' as a problem and 'liver' as a body location has been great.

However, there are occasions where e.g. 'pre-diabetes' will return 'pre-diabetes' as a problem as well as 'diabetes' as a problem. No data was annotated this way, and I would prefer the output in this and other cases was just the longer span.

Is there a way to enable this behavior? I know the spancat component has a max_positive parameter, but I don't want to restrict the former case, just the latter.

Thank you!

ines · February 21, 2022, 11:01am

Hi! I think in that case, the easiest solution would be to add a custom rule-based component that checks for overlapping spans in the doc.spans that have the same label and removes the shorter spans if necessary. This is a lot more straightforward, maintainable and reliable than trying to mess with the model weights and potentiall introducing other unintended side-effects.

clinical_nlp · February 21, 2022, 3:53pm

Thanks, Ines. Something like the code below is a pretty simple workaround. A couple questions:

Is it possible to drop a span? I can capture the overlapping span and start value to remove it from a table where I unpack the spans, but to make the logic work as a component I think I'd need to modify the doc?
For annotation purposes, with the approach you suggested I'm assuming I'd need to get predictions on my text sample and then use the predictions in spans.manual; this approach would be incompatible with spans.correct?

# iterate through the spans
for i in doc.spans['sc']:
    # identify any nested spans of same label type
    if len([x for x in doc.spans['sc'] if (i.label_==x.label_) and (i.start>=x.start) and (i.end<=x.end)])>1:
        # prune span

ines · February 22, 2022, 5:10pm

Yes, you can write to the doc.spans["sc"] and replace it with a list of filtered Span objects.

The spans.correct recipe will show you whatever the pipeline produces in the doc.spans – so if you first run your trained spancat component, followed by your rule-based component, you will only see the filtered spans.

clinical_nlp · February 23, 2022, 7:07pm

So should this come up for someone else, I think the fix here is, working from Ines's guidance, to create a function to resolve the overlapping spans of the same label type

from spacy.language import Language

@Language.component("overlapping_span_filter")
def overlapping_span_filter(doc):
    """
    rule-based component that checks for overlapping spans 
    """
    # create lists to append valid spans and corresponding scores 
    not_overlapping_spans = []
    scores_for_nos = []
    #iterate through the spans and scores
    for span, span_score in zip(doc.spans['sc'],doc.spans['sc'].attrs["scores"]):
        # identify any nested spans of same label type
        if len([x for x in doc.spans['sc'] if (span.label_==x.label_) and (span.start>=x.start) and (
            span.end<=x.end)])==1:
            # append to list
            not_overlapping_spans.append(span)
            scores_for_nos.append(span_score)
        else:
            pass
    #write spans and scores
    doc.spans['sc'] = not_overlapping_spans
    doc.spans['sc'].attrs["scores"] = scores_for_nos
    return doc

Then modify the spans.correct recipe so that you load the component after the spancat component in the spacy model

nlp = spacy.load(spacy_model)
nlp.add_pipe("overlapping_span_filter", after="spancat")

Topic		Replies	Views
Training Data after Using spans.manual usage , done , spacy , spancat	20	843	August 21, 2021
Span Cat Annotations and Incorrect Predictions spacy , spancat	4	844	June 8, 2023
Present span labels in groups in span classification task enhancement , usage , ner , custom , front-end	5	425	May 4, 2023
hierarchical text classification using spancat and potentially expanding/hiding label subclasses as they come in context textcat , front-end , spancat	6	473	September 21, 2022
Integrating SpanCat with HuggingFace, specifically AutoTrain usage , transformers , spancat	2	408	October 25, 2023

How to Restrict Nested Spans of Same Label Type in Spancat

Related topics