Displaying Span/Token Metadata

I've not used Prodigy yet so I apologize if there's a straightforward answer to this question that I missed somewhere; I couldn't find an obvious answer in the documentation.

I'm hoping to use Prodigy to annotate relation data in sentences with grammatical errors. In terms of actual annotation functionality, it looks like the relations interface will work perfectly. The issue is that I would like to display information about corrections made to the sentence to help inform how these relations are annotated. I'm hoping for an end result where the parts of the sentence that were corrected are highlighted/otherwise visually different in some way, and the content of the corrections are visible either directly under them or in a popup on mouseover. A more general way to say this is that I'm hoping to assign metadata in the form of text to specific tokens/spans and have it visible when annotation is being performed.

Basically I'd like to be able to pass in something like this:

{
  "text": "My friend is always to run",
  "tokens": [
    {"text": "My", "start": 0, "end": 2, "id": 0, "ws": true},
    {"text": "friend", "start": 3, "end": 9, "id": 1, "ws": true},
    {"text": "is", "start": 10, "end": 12, "id": 2, "ws": true},
    {"text": "always", "start": 13, "end": 19, "id": 3, "ws": true},
    {"text": "to", "start": 20, "end": 22, "id": 4, "ws": true},
    {"text": "run", "start": 23, "end": 26, "id": 5, "ws": true},
  ],
  "spans": [
    {"start": 20, "end": 26, "token_start": 4, "token_end": 5, "correction": "running"},
  ],
  ...
}

And have some way to display running as the correction for to run, without it being an actual span label for the purposes of training or being changeable/interactive in any way during annotation. The actual annotation component is straightforward span/relation annotation that seems like it should be easy with Prodigy's existing features.

Is there an easy way to do something like this? I see that there's a metadata field, but it appears to be for passing in metadata about the entire sentence/document that gets displayed in the lower right hand corner. I'm hoping there's a relatively straightforward way to attach metadata to tokens/and or spans.

Hi! From what you describe, using the "label" key here could actually work and might be the easiest solution. It would display the text nicely below the token(s) it refers to. If you're not annotating relations and spans jointly, the span labels will be static and not editable, so they'll just function as a visual guide. And you don't have to actually use this for training later on – you could just strip out the spans or span labels afterwards or just use the "relations" data.

Alternatively, the "meta" field could be an option as well – but it will be more separate. Anything you put in "meta" will be displayed in the bottom right corner, as static meta information. So you could set up custom keys in it programmatically, like "run": "running".

I've honestly not figured out how much span annotating I'll need to be doing (definitely relations though), but in any case it sounds like the reasonable first step is to try this using the label key. Thanks very much!

Edit: Just in case anyone else has a similar issue with GEC related work, I ultimately resolved this by just using the diff view. The labels idea would have been perfect if I didn't occasionally need to annotate span labels as well (I.E. were just doing relations), but it's turned out that I do, so the diff view was a better way to communicate the same information.

1 Like