Anotating relations/dependencies

Ah, we forgot to add that to the roadmap – it’s currently only listed as “coming soon” in the overview of UI interfaces.

This topic actually came up on the spaCy issue tracker the other day, where I left a comment with a few suggestions for annotating dependencies in Prodigy and creating displaCy visualizations manually.

Ideally, you’d want to move through the tasks as quickly as possible and focus on one dependency at a time. How you solve this task with Prodigy of course also depends on what you’re trying to achieve – e.g. whether you want to annotate dependencies in order to improve spaCy’s parser, create a corpus for a new language or a different type of parser (like an intent parser for example), or find the most frequent incorrectly assigned dependencies (as discussed in the spaCy issue mentioned above).

We haven’t tested the active learning component for dependency parsing yet (even though it should be possible with an approach similar to Prodigy’s NER model). But if you don’t need a model in the loop, dependency annotation is definitely already possible – it just requires creative use of the built-in annotation interfaces and some experimentation.

Simple solution

Assuming you already have a model that predicts something, the quickest and simplest solution would be to extract the dependencies and create annotation tasks from them that look like this:

{"text": "like → nsubj → I", "data": {"head": "like", "dep": "nsubj", "child": "I"}}

Using the mark recipe, Prodigy will only show you the "text", add an "answer" key and store the full annotation task in the database – so you’ll always keep a reference to the rest of the information contained in the task, like the "data" etc. If you want more control over the styling, you could either create a "html" task instead and hard-code the markup, or use a HTML template and insert the data as Mustache variables, e.g. {{ data.dep }}.

displaCy solution

The built-in displacy module in spaCy v2.0 also lets you generate the raw SVG markup. However, if you pass in a Doc object, it will render all dependencies, which doesn’t work very well in the Prodigy interface. But you can set manual=True and pass in a dictionary of words and arcs, and generate one visualisation per dependency. This should be fairly easy to do programmatically by iterating over the dependencies and creating one SVG per dependency. For inspiration, here’s how displaCy does this.

deps = {'words': [{'text': 'I'}, {'text': 'like'}, {'text': 'green'}, {'text': 'apples'}], 
        'arcs': [{'start': 0, 'end': 1, 'label': 'nsubj', 'dir': 'left'}]}
svg = displacy.render(deps, style='dep', manual=True)

# the SVG can be used as the 'html' key in your annotation task
# alternatively, you can also save the SVG to a file and use an 'image' task
task = {'html': svg} 

You might have to experiment a bit to see what works best. For example, I’d recommend shortening long sentences to make sure the graphic doesn’t end up too wide.

Other ideas

In some cases, the choice interface could also be useful – especially if you’ve already narrowed down the selection of relations you need to annotate, and your label set isn’t too large. You can also mix and match interfaces and repurpose them – for example, use the "spans" to highlight words in the text, and provide options that define the possible relations. For the intent parser example, a task could look like this:

{
    "text": "find a cafe with great wifi",
    "spans": [{"start": 0, "end": 4}, {"start": 7, "end": 11}],
    "options": [
        {"id": "ROOT", "text": "ROOT"},
        {"id": "PLACE", "text": "PLACE"},
        {"id": "QUALITY", "text": "QUALITY"}
    ]
}

This would give you an annotation card with the words “find” and “cafe” highlighted, and a list of 3 relation options to choose from.

If you end up trying one of the ideas, definitely keep us updated! (I’m actually really looking forward to implementing the Prodigy + displaCy integration now :blush:)

1 Like