Thanks, this is very helpful! I'll play with this in different browsers and see if I can reproduce it
NER models typically predict token-based tags. What those tokens are may differ – sometimes it's linguistically-motivated tokens (what you normally think of as a "word"), sometimes its word chunks like word piece tokens used by transformer models that are segmented based on what's most efficient to embed. But you usually want to work with at least some type of token definition, which also makes it easier to use pretrained embeddings. That said, in some languages that that don't really have the same concept of word = whitespace-delimited chunk (e.g. Chinese), it can make sense to work at the character level instead.
The character-based highlighting mostly exists because there are some use cases, where you might want to highlight individual characters (e.g. specific character-based implementations or very different types of models that predict characters or segmentation). But it's not usually something we recommend if you're training a token-based model because you'll easily end up with annotations of spans that don't map to actual tokens and can't be predicted or embedded.