Tokenization compatibility issues in rel.manual

If you're happy to just annotate the BPE tokens and relations between them, and don't care so much about aligning the tokens to spaCy's linguistic tokenization, you could also just load in pre-tokenized text using your tokenizer. Here's an example using a word piece tokenizer for NER annotation that aligns with a transformer model: https://prodi.gy/docs/named-entity-recognition#transformers-tokenizers

You don't have to do it within the recipe – you could also use the logic as a preprocessing step. One of the key parts here is to set the "ws" key on the tokens, a boolean indicating whether the token is followed by whitespace. Prodigy will use this in the UI to render less whitespace and preserve readability. The relations UI will still draw borders around the tokens, so it might be a bit less pretty for subword tokens – but you'll have alignment.

(Also, thanks for the kind words, glad to hear you like the new relations features :blush:)