ValueError: [E868] Found a conflicting gold annotation in a reference document

XBeg9 · January 3, 2022, 10:26pm

ValueError: [E868] Found a conflicting gold annotation in a reference document, with the following char-based span occurring both in the gold ents as well as in the negative spans: (221, 224, 'SKILL').

Would love for any suggestions, I got this from just going through the prodigy labeler and now my dataset is fully broken and can't train the model due to this error, can't find how to skip this. Any suggestion will be appreciated

ines · January 3, 2022, 10:37pm

Hi! How did you create your annotations, did you use ner.teach? Basically, the underlying problem here is that somehow, your data ended up with two annotations on the exact same example and span suggestion, but once accepted and once rejected.

Prodigy should definitely ignore cases like this and not add both versions to the training data if there's a conflict (just like it does for conflicting overlapping entities). So we'll fix this for the next release! Sorry you got blocked on this.

In the meantime, one workaround would be to export your data with db-out and look for this span in the "spans" of the examples (with a start value of 221 and end of 224). You can then remove one of these examples from your data. You can then re-import the data to a new set and keep training.

XBeg9 · January 3, 2022, 10:47pm

Hi @ines , yep I was doing exactly ner.teach (binary) and ended up with this issue. I am checking my training dataset, and it looks like there is a sometimes little difference between labeled texts (drop_duplicates didn't help), so I was curious why prodigy asks me the same tasks again, then realized that's actually different texts with a slight change (original one, not segmented). Just giving some context, probably that's how I ended up with this dead end...

Anyway, yeah db-out is exactly what I am doing right now, I hope this will solve the problem.

Thanks!

XBeg9 · January 3, 2022, 10:53pm

would love any suggestions around how to automate it, already deleted like 20 of them and it keeps saying same error, for other spans

edited:
actually, I got it fixed, it just took around 21 wrong spans to be removed from the training dataset.

for anyone who will end up here, use:

prodigy db-out {name} dataset.jsonl
then try to find {"start":937,"end":946 and remove this example from your labeled set
prodigy db-in {new_name} dataset.jsonl
try to train

if this doesn't work, repeat it again.

XBeg9 · January 4, 2022, 3:34am

btw, during training I got this error, if helps you to find the root cause:

<Task finished name='Task-652' coro=<RequestResponseCycle.run_asgi() done, defined at /Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:394> exception=ValueError("[E868] Found a conflicting gold annotation in a reference document, with the following char-based span occurring both in the gold ents as well as in the negative spans: (221, 224, 'SKILL').")>

ines · January 5, 2022, 1:20pm

Glad you found a solution!

We also hust released Prodigy v1.11.7, which should resolve the underlying problem If there are conflicting annotations of the same span (accepted and rejected), Prodigy will now ignore the rejected span and won't add it to the data, which prevents this error raised by spaCy.

Topic		Replies	Views
Error while training NER model usage , spacy , training	4	1854	September 16, 2021
IndexError [E035] training recipe ner , database , solved	6	775	June 2, 2022
Duplicate entity annotations ner	4	1957	March 13, 2019
Possible bug in nightly ner.correct usage , done , nightly	1	464	July 19, 2021
ValueError: Mismatched tokenization. in ner.make-gold ner , done	5	1450	March 11, 2018

ValueError: [E868] Found a conflicting gold annotation in a reference document

Related topics