Annotating dependecies for very long sentences

Hi! It's definitely true that once you have long texts with lots of very long dependencies, the visual gain you get from drawing arcs on top of the text can be low and allowing every token to be connected to every other token can make things more complex instead of solving complexity with a visual UI. This is more of a conceptual problem and the best solution ultimately depends on the type of annotation and what the dependencies represent.

When you say long sentences, how long are they on average? And what type of dependencies are you annotating? Are you working with syntax where you essentially need to connect every token to a root, or are you labelling different and more sparse annotations?

If you're not annotating syntax, there are various things you can do to reduce the complexity – for example, disabling tokens you don't need automatically (e.g. for coref) or annotating abstract representations (e.g. for sentence alignment).

If you are annotating syntactic dependencies (and especially if your goal is to create a proper treebank), there's obviously no way around labelling every token. I still think Prodigy can be useful here: you'll be able to toggle between line wrapping and inline view, hide/show the arcs to get a better overview and you can assign dependencies by clicking (instead of dragging). You can see a minimal example of a short sentence here, but the experience will be the same for long sentences. (Where it could become a bit trickier to ensure high performance is if your sentences are longer than ~300 tokens on average – but that would be very long sentences, so I assume yours are a bit shorter than that?)

Btw, I saw you're also emailed, and we're happy to set you up with an adademic license so you can try it out :slightly_smiling_face: