newlines in relations annotation

Hello,
First of all thank you very much for the all new features :smiling_face_with_three_hearts: :heart_eyes:

In our case we try to extract custom relations between entities that can be found in long sections of text.

The text may contain several paragraphs separated by one or several new lines. Currently, when we use rel.manual with the --wrap parameter over the entities that we had already annotate in ner.manual, the information about the layout is lost, this makes it a little bit more complicated to annotate the relationships.

Will be possible to visualize the line breaks in rel.maual similar to how it is done in ner.manual?

Will be also possible to customize the size of the space between lines when we use this interface?

Thank you very much for the great work as always! and very excited to experiment with all the new recipes

Thanks, glad to hear the new features are useful :blush:

Oh, I totally thought it already did that :thinking: Let me look into this, it's definitely something we want and should be easy to add. So I'll put this on my list for the next release.

Yes, there are two theme settings you can adjust: relationHeight (maximum height if line wrapping is disabled) and relationHeightWrap (maximum height with line wrapping). Also see here: Web Application · Prodigy · An annotation tool for AI, Machine Learning & NLP

1 Like

Thank you! for the quick response, I just tried the attributes relationHeightWrap and relationHeight and both worked good customizing the size of the line spacing.

1 Like

This is also a feature I'd like to use.

Here is what a simple "hello\nworld" looks like in "ner.manual" view:

and how the same "hello\nworld" looks like in "relations" view:

Cheers,

Hervé.

1 Like

Update: Just released v1.10.5, which now correctly uses the symbols for newlines and tabs in the relations UI :smiley:

2 Likes

Awesome, thanks!

Hi @ines, I just updated to the v1.10.5 and newlines are now displayed correctly with the character, thanks a lot! however, it would be very useful if their associated linebreaks were also visualized in the interface layout. Would it be possible to reproduce a similar behaviour as the hide_newlines ner config option in this interface?

I just checked and, indeed, new lines now appear as in relations. Thanks.

However, I confirm what @AlejandroJCR describes:
no actual line break is added.

It just shows

hello  ↵  world

instead of

hello  ↵  
world

even when hide_newlines is set to False.

This is similar to what I experienced with the ner_manual view. If I tokenised \n\t as as single token it displayed the control character symbols for both, added the newline (i.e. text was split over 2 lines) but didn't indent the next line. If I tokenised as \n,\t it seemed to work.

Yes, that's currently expected – the relations UI just adds the symbols at the moment, it doesn't add actual line breaks. It should be possible to add line breaks for newline-only tokens and re-render the tokens accordingly, but I haven't looked into that yet. (Not sure how to solve newlines within tokens/spans in the relations UI, though, that's going to be quite difficult to visualise.)

The idea of the tab symbol is to have it replace the actual \t (like it's done in Word etc.)

Hi Ines,

I'm facing the same challenge with my relation extraction annotation step: it turns out to be very hard to interpret my texts without the proper line breaks that I was used to during the NER tagging step.

Just said that "It should be possible to add line breaks for newline-only tokens and re-render the tokens accordingly" but that's not something we as end users can do by tweaking javascript or css in the custom_theme configs, right?

Is this something that you may consider adding in a next version of Podigy?

If you have any suggestions on alternative solutions using custom recipes, UI's etc I would be happy to know too.

Many thanks

The closest I came to a workable solution is to pad all newlines with enough spaces so the token takes up the whole width:

I would also need to recalculate the indexes of the pre-annotated entities once before and once after the annotation step.

@ines could you give some insight into the possibility to solve this issue in Prodigy on short term? Maybe I can wait (and avoid the above workaround) if the timing permits.

Thx

1 Like

I can definitely implement the "true" newlines for the future, that should be no problem – it just needs some exprimentation, because this UI is a bit more complex than the others and canvas-based. (The relations UI was originally designed without newlines in mind, and I later added the icons for invisible characters later, so you're not just shown empty tokens.)

2 Likes

Cool, thanks in advance! I will keep an eye on this topic then.

Hello Ines,
I'm facing the same problem as @AlejandroJCR and @hbredin in "relations" view. I am currently trying to annotate dialogues and it would be easier to have a separated line per locutor.
I keep an eye on this topic too, thank you :slight_smile:

I'm also interested in having newlines in the relationship interface!

Ah, almost forgot to update this thread: We released v1.10.6 yesterday, which includes support for "real" newlines in the relations UI. The newlines are added if wrapping is enabled and they currently collapse if you disable line wrapping.

3 Likes

Hey everyone, thank you very much for adding this new feature, it correctly displays the true newlines tokens for each annotation, however, I have noticed that after extending the token limit, I cannot more render long documents which encompass several paragraphs, because the browser threw an error or freezes, which it used to work in previous versions.

Hmm, that's interesting :thinking: I don't see how this could be related but then again, I don't think anything else changed in the interface. How do you have it configured? Are you using line wrapping or not? And how long are your documents (number of tokens)?

Yes, I'm using the wrapping option, my documents are around 500-1500 tokens with a normal distribution of the size. Then token_limits > 750 are not working for now.

If might be useful, I already have a model in the loop which make predictions for each annotation, so at first glance, these are already populated, my use case has 21 relationable entities with 9 relations, then these are heavily populated tasks, could this influence some performance issues on the browser?