ValueError: [E868] Found a conflicting gold annotation in a reference document

ValueError: [E868] Found a conflicting gold annotation in a reference document, with the following char-based span occurring both in the gold ents as well as in the negative spans: (221, 224, 'SKILL').

Would love for any suggestions, I got this from just going through the prodigy labeler and now my dataset is fully broken and can't train the model due to this error, can't find how to skip this. Any suggestion will be appreciated :slight_smile:

Hi! How did you create your annotations, did you use ner.teach? Basically, the underlying problem here is that somehow, your data ended up with two annotations on the exact same example and span suggestion, but once accepted and once rejected.

Prodigy should definitely ignore cases like this and not add both versions to the training data if there's a conflict (just like it does for conflicting overlapping entities). So we'll fix this for the next release! Sorry you got blocked on this.

In the meantime, one workaround would be to export your data with db-out and look for this span in the "spans" of the examples (with a start value of 221 and end of 224). You can then remove one of these examples from your data. You can then re-import the data to a new set and keep training.

Hi @ines , yep I was doing exactly ner.teach (binary) and ended up with this issue. I am checking my training dataset, and it looks like there is a sometimes little difference between labeled texts (drop_duplicates didn't help), so I was curious why prodigy asks me the same tasks again, then realized that's actually different texts with a slight change (original one, not segmented). Just giving some context, probably that's how I ended up with this dead end...

Anyway, yeah db-out is exactly what I am doing right now, I hope this will solve the problem.

Thanks!

would love any suggestions around how to automate it, already deleted like 20 of them and it keeps saying same error, for other spans :frowning:

edited:
actually, I got it fixed, it just took around 21 wrong spans to be removed from the training dataset.

for anyone who will end up here, use:

  • prodigy db-out {name} dataset.jsonl
  • then try to find {"start":937,"end":946 and remove this example from your labeled set
  • prodigy db-in {new_name} dataset.jsonl
  • try to train

if this doesn't work, repeat it again.

1 Like

btw, during training I got this error, if helps you to find the root cause:

<Task finished name='Task-652' coro=<RequestResponseCycle.run_asgi() done, defined at /Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:394> exception=ValueError("[E868] Found a conflicting gold annotation in a reference document, with the following char-based span occurring both in the gold ents as well as in the negative spans: (221, 224, 'SKILL').")>

Glad you found a solution!

We also hust released Prodigy v1.11.7, which should resolve the underlying problem :slight_smile: If there are conflicting annotations of the same span (accepted and rejected), Prodigy will now ignore the rejected span and won't add it to the data, which prevents this error raised by spaCy.

1 Like