spaCy Tokenization issue


This is actually an issue related to spaCy but I didn't found a support page for spaCy and hence posting it here. Any help will be much appreciated.

While creating a doc using

nlp = spacy.load("en_core_web_sm")
doc = nlp("I have $10K")

Here, when I am printing our tokens in the doc, the output is

["I", "have", "$", "10", "K"]

but I want the output to be the following and it should be a standard tokenization technique as well

["I", "have", "$", "10K"]

Any thoughts on how to achieve this?

Hi! We try to keep this forum very focused on Prodigy – for general usage questions around spaCy, the discussion forum is usually a better place: Discussions · explosion/spaCy · GitHub

Also see the documentation on adding special case rules and customizing the tokenizer rule sets for reference: