Welcome to the forum @Fangjian
Not sure if you've seen our docs on annotation for BERT-like transformers, but spaCy v3 takes care of aligning linguistic tokenization (produced by spaCy tokenizers) to BERT tokenization before training.
So if you don't need to annotate the data that is already BERT-tokenized, you can work with spaCy default tokenizer in your Prodigy annotation workflows i.e. ner.llm.fetch
and ner.manual
and once you're done and ready to train a transformer pipeline you can export your data with data-to-spacy
and use it for training with spaCy.
This post details each step in detail.
That is if you're planning to train a spaCy model. Let me know if that's not the case!