We’re using a tokenizer which breaks up hyphenated words and all punctuation. The prodigy UI inserts whitespace between all these tokens, which makes the sentences harder to read. Example:
anti-PD-1/PD-L1 antibodies is displayed as
anti - PD - 1 / PD - L1 antibodies
We need to use our custom tokenizer to enable fine-grained annotation. We have confirmed that the extra whitespace being introduced by Prodigy, since our tokenizer has correct character offsets for “start” and “end.” Is there a way to disable the extra whitespace?