NER training on dataset which was annotated on older version.

Hi! It looks like your data ended up with misaligned tokens (which old versions of spaCy quietly skipped, but which it now raises an error about explicitly). Did you use the same tokenizer during annotation and training?

One easy way to find the misaligned examples, check what's wrong and/or just exclude them from your dataset would be to load your Prodigy dataset and use spaCy's Doc.char_span method to check that all spans refer to valid tokens. If there are only a few problematic examples, you could just skip them and save the filtered examples to a new dataset.

import spacy
from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("your_dataset_here")  # Prodigy dataset
nlp = spacy.blank("en")  # whichever language/model you used
for example in examples: 
    doc = nlp(example["text"])
    for span in example["spans"]:
        char_span = doc.char_span(span["start"], span["end"])
        if char_span is None:  # start and end don't map to tokens
            print("Misaligned tokens", example["text"], span)