Does ner.llm.correct produce overlapping entities?

DGMS90 · January 6, 2025, 10:52am

Hey,

I'm just looking for some advice on some NER annotations that I'm doing.

I'm currently using the ner.llm.correct recpie to connect to a model hosted in Azure (GPT 4o-mini). The task is to annotate job titles, and I want to extract multiple types of entity. For example the TITLE, and the SPECIALISM.

In the attached screenshot, it looks like the model is outputting overlapping entities. Is this the correct behaviour? My understanding was that entities in a NER task should be distinct and non-overlapping.

Thanks in advance!

magdaaniol · January 8, 2025, 3:52pm

Hi @DGMS90,

You're definitely right about spacy-llm NER task producing non-overlapping entities.
What you see under "Show the response from the LLM" is the model's raw response. If you check the prompt there above, you'll see the LLM was instructed to extract all relevant entities - and that will include overlapping entities (note that you can customize the prompt).
The idea is to let the LLM overgenerate and have the Task logic to process it and produce the valid output. In this case, spacy-llm NER task filters the duplicate and/or overlapping entities by preferring the longer ones in case of overlap.
Concretely, here's the function that does this.

The result of the NER task's resolution of the overlapping spans is what you can see in the Prodigy's NER card: only the TITLE is highlighted. The other entities coming from LLM were discarded by the filtering function I linked above.

DGMS90 · January 9, 2025, 9:59am

Ah amazing - thanks for the quick response and the link to the filtering code

That is what I thought was happening, but I wanted to double check rather than assume.

magdaaniol · January 10, 2025, 12:47pm

Glad that helped

Topic		Replies	Views
What happens if your annotation has overlapping entity spans? usage , spacy	8	8774	January 12, 2024
Highlighting spans that are not the entities to be labeled when using ner.correct usage , ner	1	454	December 21, 2020
Overlapping NER usage , ner , spacy	2	340	July 1, 2021
Training a new model, using OpenAI API usage , ner , spacy	2	40	January 18, 2025
ValueError: A Token can only be part of one entity [...] usage , ner	4	3459	July 28, 2020

Does ner.llm.correct produce overlapping entities?

Related topics