I'm just looking for some advice on some NER annotations that I'm doing.
I'm currently using the ner.llm.correct recpie to connect to a model hosted in Azure (GPT 4o-mini). The task is to annotate job titles, and I want to extract multiple types of entity. For example the TITLE, and the SPECIALISM.
In the attached screenshot, it looks like the model is outputting overlapping entities. Is this the correct behaviour? My understanding was that entities in a NER task should be distinct and non-overlapping.
You're definitely right about spacy-llm NER task producing non-overlapping entities.
What you see under "Show the response from the LLM" is the model's raw response. If you check the prompt there above, you'll see the LLM was instructed to extract all relevant entities - and that will include overlapping entities (note that you can customize the prompt).
The idea is to let the LLM overgenerate and have the Task logic to process it and produce the valid output. In this case, spacy-llm NER task filters the duplicate and/or overlapping entities by preferring the longer ones in case of overlap.
Concretely, here's the function that does this.
The result of the NER task's resolution of the overlapping spans is what you can see in the Prodigy's NER card: only the TITLE is highlighted. The other entities coming from LLM were discarded by the filtering function I linked above.