I am loading pre-annotated data into
prodigy ner.train, as i want to reduce annotation workload. Unfortunately, the software used to annotate named entities gives mismatched tokens that do not align with the spacy tokenizer I am using.
I get the following error, and prodigy does not load any more data into the UI. Is there any way I can throw an exception instead, that ensures cases with misaligned entities are ignored by prodigy in the annotation step?
ValueError: Mismatched tokenization. Can't resolve span to token index 594. This can happen if your data contains pre-set spans. Make sure that the spans match spaCy's tokenization or add a 'tokens' property to your task.