relation is responding very slowly

BenfengXu · September 28, 2021, 6:57am

Hi,
I have a major question on relation annotation being responded very slowly on my web; and a minor question on visulizing chinese tokenization (on relation annotation).

Q1
The relation annotation front end is responding very slow on my operation, like moving, clicking, thus my annotation can not proceed. This happens when I have a rather long text (each document has thousands words). If I force split the document, the problem is greatly eased. However, this is not acceptable, because I have to annotate relations that across different sentences, and there is no valid position to split the text. (annother workaround is to split the document with overlapped sentences, but this will cause extra efforts, because I have to annotate the overlapped sentences twice).

Some info that might be helpful:

My front end looks like this:

image2512×1102 303 KB
My launch command:

python3 -m prodigy rel.manual re1 blank:zh dataset:demo1 --label=Relation.label --wrap

Q2
I'm annotating on chinese text, so the text is not correctly tokenized and visulized as such. I tried load a spacy pipeline model (which is produced using spacy pipeline to_disk, and can correctly tokenize my text), But this changes nothing, the visulization I see in the web is still the same, as shown above (only pre-labeled entity are tokenized correctly, others remain plain text).
This is how I create my tokenize model:

from spacy.lang.zh import Chinese
cfg = {"segmenter": "jieba"}
nlp = Chinese.from_config({"nlp": {"tokenizer": cfg}})
nlp.to_disk("jieba_tokenizer")

This is how I import it in my rel annotation service:

python3 -m prodigy rel.manual re1 jieba_tokenizer dataset:demo1 --label=Relation.label --wrap

Is this the problem of rel.mannual, or am I doing wrong? Can you tip me the related document or example?

Thanks!

Benfeng

ines · September 29, 2021, 9:52am

Hi and sorry about this – it's currently expected that the interface becomes less performant for very long documents with many tokens, and we're working on a rewrite that doesn't have this problem. I think what makes it additionally tricky in your case is that you just end up with more tokens overall, due to the way the characters are segmented.

As a workaround, one thing you can do is use patterns to disable all tokens that you know won't ever be part of a relation – of course, only if that's possible. Obvious candidates are punctuation, but you might be able to write other disable patterns based on part-of-speech tags etc.

Can you double-check that when you load your custom pipeline with the tokenizer in Python and process a text, the tokens are segmented correctly? If a Doc produced by the model shows the correct tokens, Prodigy should refelct this accordingly in all recipes that use the model for tokenization.

prohistamine · October 18, 2021, 7:20pm

Hi @ines, what is the timeline like for the rewrite that you mentioned?

ines · October 20, 2021, 8:02am

I can't give you a specific ETA at this point yet, sorry! But it's definitely something we have on our list of enhancements.

apohllo · January 27, 2022, 8:30pm

I have the same issue with very slow (unusable) interface with long documents and rel.manual.
Any updates?

apohllo · February 6, 2022, 1:09pm

The issue occurs with at least hover event on tokens. @ines is it maybe possible to edit some JS and disable some features that causing this or this is a core code?
I am working on long range dependency problems. I have tested LabelStudio and its relation annotation is working fast, but the software has other drawbacks.

ines · February 10, 2022, 10:37am

It's unfortunately not as simple as that because it all ties into how the interface is implemented,so need to refactor the interface using a different technology, which is currently in progress. The easiest workaround at the moment if you want to work with long dependencies is to try and limit the possible connections as much as you can by disabling tokens, e.g. those that you know won't be relevant. So if you're annotating relations between named entities, you can disable all tokens that are not part of an entity.

apohllo · March 5, 2022, 11:12am

I disabled all the tokens that are not part of an entity and the time to create one relation is about 8 seconds (in document with 6k tokens and 500 spans)

Great to hear that refactor is in progress!

Topic		Replies	Views
how to annotate a longer text in rel.manual? relations	2	442	November 11, 2022
Tokenization compatibility issues in rel.manual enhancement , usage , done , transformers , relations	7	1427	September 8, 2020
Rel receipe is getting killed after a min or so.	3	908	September 27, 2022
UI slow when serving snippets over 500 words ner , front-end	6	692	January 27, 2022
How to do relation annotation after using bert.mer.manual transformers , relations	2	366	December 12, 2023

relation is responding very slowly

Related topics