Ahhh, from your initial post I just assumed you were doing actual NER annotations and then annotating relations between named entities. If your spans are sentences or even paragraphs, connecting them like this in the relations UI definitely seems inefficient and overkill – it's not about token boundaries or connecting words, you're just linking up these fragments, right? (I would even argue that you might be making your life much harder by highlighting whole sentences in the manual NER interface.)
I think if the goal is to connect these long text fragments, there's not really a benefit in placing arrows on top of the actual inline text. The visual gain from this is absolutely minimal (if not negative) and once you have more than one relation, it's pretty much always going to get messy.
One simple approach I just thought of that could work and that you could easily implement with the existing interfaces: split the content and alignment/relation annotation into two blocks so you can read the original text separately and then connect references to the sentences/paragraphs instead of the actual text.
For example, you could split your text into logical units, assign numbers, letters or other symbols to each sentence/paragraph/unit, and show the spans that you already have assigned labels to. In the
relations block, you then annotate those numbers/letters/symbols instead of the blocks of text. It does mean you have to refer back to the original text when you assign the relations – but even with very long texts, you'd end up with a fairly compact, inline representation of the whole document and how its fragments are connected, which could be pretty cool.
- First paragraph...
- Second paragraph... [highlighted with label]
- Third paragraph...
 [2, with span label] 
If you store the mapping of fragments to numbers/letters/symbols in the underlying JSON, resolving the annotated relations back to whatever you need (character offsets into the text, tokens, whatever) will be trivial.