Hi,
I have a major question on relation annotation being responded very slowly on my web; and a minor question on visulizing chinese tokenization (on relation annotation).
Q1
The relation annotation front end is responding very slow on my operation, like moving, clicking, thus my annotation can not proceed. This happens when I have a rather long text (each document has thousands words). If I force split the document, the problem is greatly eased. However, this is not acceptable, because I have to annotate relations that across different sentences, and there is no valid position to split the text. (annother workaround is to split the document with overlapped sentences, but this will cause extra efforts, because I have to annotate the overlapped sentences twice).
Some info that might be helpful:
-
My front end looks like this:
-
My launch command:
python3 -m prodigy rel.manual re1 blank:zh dataset:demo1 --label=Relation.label --wrap
Q2
I'm annotating on chinese text, so the text is not correctly tokenized and visulized as such. I tried load a spacy pipeline model (which is produced using spacy pipeline to_disk, and can correctly tokenize my text), But this changes nothing, the visulization I see in the web is still the same, as shown above (only pre-labeled entity are tokenized correctly, others remain plain text).
This is how I create my tokenize model:
from spacy.lang.zh import Chinese
cfg = {"segmenter": "jieba"}
nlp = Chinese.from_config({"nlp": {"tokenizer": cfg}})
nlp.to_disk("jieba_tokenizer")
This is how I import it in my rel annotation service:
python3 -m prodigy rel.manual re1 jieba_tokenizer dataset:demo1 --label=Relation.label --wrap
Is this the problem of rel.mannual, or am I doing wrong? Can you tip me the related document or example?
Thanks!
Benfeng