Misalignment for tokenization when use ner.llm.fetch and bert.ner.manual

Welcome to the forum @Fangjian :waving_hand:

Not sure if you've seen our docs on annotation for BERT-like transformers, but spaCy v3 takes care of aligning linguistic tokenization (produced by spaCy tokenizers) to BERT tokenization before training.

So if you don't need to annotate the data that is already BERT-tokenized, you can work with spaCy default tokenizer in your Prodigy annotation workflows i.e. ner.llm.fetch and ner.manual and once you're done and ready to train a transformer pipeline you can export your data with data-to-spacy and use it for training with spaCy.
This post details each step in detail.
That is if you're planning to train a spaCy model. Let me know if that's not the case!