I have prodigy v1.10.2. I used ner.manual to annotate some named entities. It was saved in a dataset in the prodigy environment, lets call it "myDataset1". Then I used rel.manual with source dataset:myDataset1 and --label myRel1,myRel2 --span-label myEntity1,myEntity2 with the Entities being the same ones I annotated in before using ner.manual.
Then I annotated myRel1 and myRel2 between Named Entities, and I also changed some annotations of the Named Entities which I previously made. Saved the dataset as "myDataset2".
Now I want to use data-to-spacy command. First I tried
✘ Invalid data for component 'ner'
spans -> 7 -> start field required
spans -> 7 -> end field required
This error was also previously mentioned here. I exported the dataset using db-out and that worked. I checked the data and indeed, there was one sample that had annotated only {'label': 'myEntity1'} without 'text', 'start', 'end',...
I also tried to create the spacy-datasets seperately, which worked for the Parser:
Hi! Do you know which version of Prodigy you created the annotations with? The same version you're currently using? It seems like for some reason, you ended up with an invalid span here.
The easiest way to find and exclude it would be to just go over your data, check the spans and if they include a start/end and only keep the valid spans in a new dataset:
from prodigy.components.db import connect
db = connect()
examples = db.get_dataset("myDataset2")
filtered_examples = []
for eg in examples:
if "spans" in eg:
new_spans = []
for span in eg["spans"]:
if "start" not in span or "end" not in span:
print("Found bad span:", span)
else:
new_spans.append(span)
eg["spans"] = new_spans
filtered_examples.append(eg)
# Add filtered examples to new dataset
db.add_dataset("myDataset2_filtered")
db.add_examples(filtered_examples, ["myDataset2_filtered"])
Hi I'm enountering a similar error. Using rel.manual, I occassionally get invalid spans from the annotations. A span is expected to be the following: {"start":42,"end":63,"token_start":9,"token_end":12,"label":"CAPAB"}
But there exists random span entries hidden in the span annotations: {"label":"CAPAB"}
Seeing if I can replicate the error.
If you can find a reproducible example/workflow, that would be super helpful It seems like there's some issue with how spans are added/removed that only occurs under certain conditions, given that it hasn't come up a lot.