No start and end of span using data-to-spacy after rel.manual

ines · September 7, 2020, 7:45pm

Hi! Do you know which version of Prodigy you created the annotations with? The same version you're currently using? It seems like for some reason, you ended up with an invalid span here.

The easiest way to find and exclude it would be to just go over your data, check the spans and if they include a start/end and only keep the valid spans in a new dataset:

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("myDataset2")
filtered_examples = []
for eg in examples:
    if "spans" in eg:
        new_spans = []
        for span in eg["spans"]:
            if "start" not in span or "end" not in span:
                print("Found bad span:", span)
            else:
                new_spans.append(span)
        eg["spans"] = new_spans
    filtered_examples.append(eg)

# Add filtered examples to new dataset
db.add_dataset("myDataset2_filtered")
db.add_examples(filtered_examples, ["myDataset2_filtered"])

Topic		Replies	Views
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018
rel.manual to train ner and dependency ner , done , solved , dep , relations	15	2047	September 7, 2020
merging a data annotated by regex with the annotated data by prodigy usage , ner , spacy	1	482	August 7, 2019
Getting Started Questions usage , ner	1	626	November 6, 2018
Skip mismatched tokenization? usage , ner , spacy , solved	2	394	February 8, 2022

No start and end of span using data-to-spacy after rel.manual

Related topics