Hi! Did you use the same Prodigy version for annotation and training, or did you collect the annotations in a previous version?
It’s likely that this is related to the update to spaCy v2.1, which is stricter about gold standard data and constraints for the parser and named entity recognizer. See my reply from this thread:
So you might want to double-check the data and see if you have any “illegal” spans in there. It’s usually pretty rare and removing them should be no problem, because in most cases, they’d be rejected suggestions anyway.
Multi-word entities are no problem – in fact, this is one of the key features of NER.
But spaCy now explicitly raises errors for spans that start or end with whitespace characters, or consist of only whitespace. So "Artificial Intelligence" is totally fine – but an annotated entity for "\nArtificial Intelligence" or "\n" would be invalid.
You can export your dataset by running the db-out command and then check the JSONL file:
prodigy db-out resume_ner > resume_ner.jsonl
After you’ve removed the problematic spans or have corrected them, you can then reimport the data to a new dataset:
prodigy db-in resume_ner_fixed resume_ner.jsonl
You can probably also write a script to find the problematic entities automatically and then exclude them, and add the result to a new dataset. I haven’t tested this yet, but something like this should work:
from prodigy.components.db import connect
db = connect()
examples = db.get_dataset("resume_ner")
fixed_examples = 
whitespace = (" ", "\n") # etc.
if text.startswith(whitespace) or text.endswith(whitespace):
for char in whitespace:
if text == char:
for eg in examples:
new_spans = 
for span in eg.get("spans", ):
entity = eg["text"][span["start"]:span["end"]]
if not is_whitespace_entity(entity):
eg["spans"] = new_spans
I’m sorry but I don’t think I understand your question.
We really can’t provide much project-specific advice, as this crosses past questions of how to use Prodigy, into much more general questions around how to solve specific problems with NLP or ML technologies.
If you need urgent help with your project, you might try posting a request to hire a freelancer in the consultants thread: spaCy/prodigy consultants?