ner.train-curve error on whitespace

Hi, we've been running into an error trying to use train-curve. I found ner.batch-train after ner.maual results error (Value error : [E024]) and some other answers (both regarding Prodigy as well as spaCy) that say this is caused by tokens that begin or end with whitespace, and as a solution we should remove the bad spans as they would be "reject" annotations anyway. However, we used exclusively manual labeling for the dataset in question, so the dataset is all accepts and this solution doesn't seem right. I was wondering if you could help me understand: does Prodigy create tokens for ner.manual that begin or end with whitespace? If so, wouldn't that mean those token are unusable for training without additional processing?

Followup -- this only affects NER spans, if I understand correctly, but the Prodigy jsonl format includes references to the original tokens in addition to the character indices in the text. When I correct the whitespace issue, do I also have to change the start/end of the original token, or is it enough to just adjust the start/end character indices of the NER span?

Prodigy shouldn't be creating entities with whitespace, I wouldn't think. So maybe the tokenization is mismatched?

The easiest thing would be to find and review the spans that it says are mismatched. Have you been able to print them out and review them? If not I can suggest some code that should help find them. As a first step, you could also look at the dataset quickly with ner.print-dataset, which I normally pipe into the less command.