dependency annotation with frequent, very long dependencies

Hi Prodigy team,

We have an ongoing dependency annotation project that is presenting some challenges. We are working with in some cases extremely long sentences (200+ tokens) and the full slate of 45 dependency labels. The source texts come from the financial and legal domains, where even most shorter sentences have large (30-40 token-length) arcs that have to be drawn. This is the primary challenge, as it induces significant cognitive load on the annotators. At its worst, some arcs span over a hundred tokens. Given the difficulty of predicting arcs of such length, data like that is what we are primarily interested in annotating. Do you have any recommendations for task setup that would ease the burden?

Barring that there are some available customization options that would help us (aside from our own javascript front end), below are some suggestions that would make the UI more practical for our use case / annotation team. Apologies if any of these are already available - I did look quite a bit at the documentation but I was unable to locate any of these options.

  1. Dependency labels should be modifiable in situ, rather than having to delete and then redraw the dependency with a different label selected.
  2. I don’t know how useful this is broadly, but for convenience it would be nice if there was an option to enforce the constraint that a head have only and only one incoming dependency. Out of the box, and as far as I can tell always, Prodigy allows multiple incoming arcs to a head. There’s no reason that we can’t filter out such annotations post-hoc, but they are not well-formed.
  3. An "undo"/"redo" functionality for edits made would be very useful. Misclicks can be painful to undo, especially with the long sentences that we are working with. Preferably the undo/redo functionality should "remember" on an item-by-item basis, so if an annotator returns to a sentence (that hasn't been saved) and clicks "undo", it would undo the last change on that graph.
  4. Clicking on an arc should highlight the two tokens that the arc relates, to assist rapid identification of the content of the annotation. This is another enhancement with particular benefit for longer sentences.
  5. Wrap for a flat graph is quite convenient for annotators that prefer to work with flat graphs, but some annotators strongly prefer that a graph should have a tree view, beginning at the top with the root and propagating downward recursively beginning with its immediate dependents like

Thanks in advance for any help.

Hi! These are all good suggestions, thanks for the detailed write-up :100:

It's definitely true that Prodigy's relations UI and workflows aren't optimised for creating actual treebanks where you care about annotating every single dependency label and where you kinda have to accept that the task is fairly abstract and a lot of work and can't easily be semi-automated. You might find that a more specialised library like Brat is just a better fit for this particular use case.

(Only semi-related, but we've actually been thinking about new approaches to syntactic dependency annotation and ways to reimagine the task to make it faster and more efficient. @honnibal always had this vision of a UI and workflow that mimicks transiton-based parsing. So you can move through the sentence left ot right, put tokens on the stack and select the different transitions. We even had a demo of this once in the old displaCy visualizer and it's pretty fast and fun, but also abstract with a significant learning curve for the annotator if they're not familiar with the transition system.)