ValueError: A Token can only be part of one entity [...]

thomashacker · November 9, 2019, 12:16pm

Hey Everyone! I was following the "Training a new Entity Type" - YT Tutorial and suddenly got this Error:

ValueError: [E103] Trying to set conflicting doc.ents: '(94, 98, 'CONDITION')' and '(94, 98, 'CONDITION')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

I can't really tell what's the cause for this

The sentence which gave the Error was:
"Resolvi usar novamente a carnitina, depois de ler que ele resolve mtos problemas."

Maybe it gave me the Error because the Text was in a different language?

Thank you for your help and Greetings from Berlin City!

ines · November 11, 2019, 12:11pm

Hi! The language is definitely not the problem here. What the error message is trying to tell you is that you somehow ended up with two entity annotations that overlap - or, in this case, are indentical.

I'm a bit confused how this could have happened – normally, Prodigy should only ever show you the same text once, so there's not really a way to generate exact duplicates, because you should never be asked the same thing twice.

When did this error occur? During annotation with ner.teach, or during training with ner.batch-train? Are you using the latest version of Prodigy? And coud you run the db-out command to export your dataset and try to find the sentence (e.g. in your editor)? Is it in there twice, or only once?

Edit: Can you check if you're using spaCy v2.2? Prodigy isn't officially compatible with the latest version yet, which introduces backwards-incompatible stricter handling of overlapping entities. If you're installing from the Prodigy wheel, it should auto-install the compatible spaCy version. Also see here:

nialloconnor · November 20, 2019, 8:59pm

@ines thanks for the tip I noticed this after upgrading to Spacy v2.2

poziryna84 · July 27, 2020, 11:21am

Hi ines,

I have the same problem and in my case the problem is definitely the accured during the annotation with Brat. I extracted the texts with annotations that overlap and it looks like this:

train_data[0][0][1391 :1448]
Out[125]: 'carcinoma renal papilar de células claras y eosinofílicas'

train_data[0][0][1391:1414]
Out[126]: 'carcinoma renal papilar'

the espression and its subexpression both were annotated.

How should I deal with that?

ines · July 28, 2020, 8:54am

If you want to use the data to train a named entity recognition model, you'd have to pick one of the spans and possibly adjust your annotation scheme so it's something the model can learn from most effectively. In the example you posted, the first one looks more like a whole subclause, right? This wouldn't really be a good fit anyways, and likely something a model would struggle to learn because it's pretty far from what's typically considered a named entity (e.g. a proper noun).

It's probably a good approach to prefer shorter spans – you should be able to do this programmatically by just iterating over your annotations, finding duplicates with overlapping start/end indices and filtering for the shortest span. You could also use Prodigy to stream in both versions of the text and manually select which one you prefer / which one makes most sense.

Topic		Replies	Views
ner.batch-train ERROR: Trying to set conflicting doc.ents usage , spacy , solved	8	3178	November 12, 2019
Error when trying to retrain the NER model for Spacy v2.2.1 install , solved	1	589	October 16, 2019
ValueError: [E868] Found a conflicting gold annotation in a reference document ner , done , training	5	500	January 5, 2022
NER overlapping datasets, meaning of lack of annotation usage , ner , best-practices	1	1198	April 25, 2019
Prodigy annotations from older from to newer version usage , ner , spacy , solved	5	967	January 16, 2020

ValueError: A Token can only be part of one entity [...]

Related topics