Disable extra large spaces

Prodigy appears to render whitespace many times wider than other characters. This obviously makes sense if whitespace is a delimiter as it makes it easier for annotators to select entities. It does the exact opposite when whitespace is either one of many different delimiters or not a delimiter (i.e. often occurs in an entity). For example, the text below including "VAV T91 looks odd because of the stretching - it makes no sense in this context to treat whitespace any differently to [.-_]. Is there a setting that disables this?

In this case I've generated my own tokens (includes tokens that are whitespace), and set ws: "false" for all tokens.

hi @david-waterworth,

Thanks for your question.

Can you provide the raw .jsonl file you're providing as the source (input)? Also, just to be safe, can you provide the prodigy command you're running (e.g., what tokenizer are you using?) and your prodigy version (i.e., run prodigy stats)?

I tried to create your example of how I think you intended your data to look like:

{"text": "NAE55-7 NAE55-7/FC-1. VAV T91 .Settings.CLG-MAXFLOW"}

Notice there's only one white space between VAV T91.

And I don't see the problem. I suspect I'm not using the same example you're using.


Thank you!