Using relations interface for large texts

Hi,

I am trying to label snippets of texts (1 to 4 sentences) and consecutively relations between these snippets of text for a corpus of relatively large texts (+- 500 words).

As mentioned in the documentation I am first labelling the snippets of text and consecutively adding the relations on the labelled snippets.

Labelling the snippets works really well using the NER-interface, but when trying to add the relations (using the Relations interface) between the snippets of text, the text becomes almost unreadable.

Is it possible (in the near future) to add relations using the NER interface instead of the relations interface (with all the blocks around each word)? Or is there a setting I can change that keeps the original formatting and highlights the labelled snippets (as is in the NER interface).

Regards,
Thijs

Hi! In the default configuration, the relations UI allows adding relations between all tokens, in all directions, and multiple connections per token. So the interface needs to be able to display and represent any of these annotations, which is why it works/looks the way it does. In order to make the annotation process more efficient, the relations workflows allow disabling tokens, so you can only focus on the relevant spans (e.g. your entities).

So if you already know which tokens can be part of a relation and which tokens can't (for instance, if you're loading in pre-labelled examples with entities and only want to annotate relations between entities), you typically want to add some disable patterns for all other tokens. In your case, that could be something like "ENT_TYPE": {"NOT_IN": ["PERSON", "ORG"]}. Here's an example of using disable patterns: https://prodi.gy/docs/api-interfaces#ner_manual

Hi Ines,

Thanks a lot for your quick answer. Disabling the tokens does not really make the text more readable. I will try to explain my use case a bit better. I have rulings of judges, where certain paragraphs entail other paragraphs (See picture 1). What I try to do is label the span of different paragraphs i.e. entailment fragments and entailing paragraphs (similar to certain tasks in this competition). There are multiple of these entailment relationships in one case and one entailment fragment can result in multiple different entailing paragraphs.

In my case (dummy example) it looks a bit like Picture 2 for labelling the paragraphs, but I would also like to annotate the relationship which looks like Picture 3 in Prodigy at the moment (not really usable). What I would like to do is add relationships while keeping the view as in Picture 2.

Picture 1 (source)
image

Picture 2 (dummy example, span labeling)

Picture 3 (dummy example, relation labeling)

Ahhh, from your initial post I just assumed you were doing actual NER annotations and then annotating relations between named entities. If your spans are sentences or even paragraphs, connecting them like this in the relations UI definitely seems inefficient and overkill – it's not about token boundaries or connecting words, you're just linking up these fragments, right? (I would even argue that you might be making your life much harder by highlighting whole sentences in the manual NER interface.)

I think if the goal is to connect these long text fragments, there's not really a benefit in placing arrows on top of the actual inline text. The visual gain from this is absolutely minimal (if not negative) and once you have more than one relation, it's pretty much always going to get messy.

One simple approach I just thought of that could work and that you could easily implement with the existing interfaces: split the content and alignment/relation annotation into two blocks so you can read the original text separately and then connect references to the sentences/paragraphs instead of the actual text.

For example, you could split your text into logical units, assign numbers, letters or other symbols to each sentence/paragraph/unit, and show the spans that you already have assigned labels to. In the relations block, you then annotate those numbers/letters/symbols instead of the blocks of text. It does mean you have to refer back to the original text when you assign the relations – but even with very long texts, you'd end up with a fairly compact, inline representation of the whole document and how its fragments are connected, which could be pretty cool.

Block 1: html

  1. First paragraph...
  2. Second paragraph... [highlighted with label]
  3. Third paragraph...

Block 2: relations

[1] [2, with span label] [3]

If you store the mapping of fragments to numbers/letters/symbols in the underlying JSON, resolving the annotated relations back to whatever you need (character offsets into the text, tokens, whatever) will be trivial.

Thanks for the elaborate answer! That seems like a decent way to go.

1 Like