I have an odd use case that requires me to label content within entire academic articles, which are far more than 500 words in length. I've noticed that the labeling is pretty slow, often taking a second or so to 'snap' to a phrase that I'm labeling. I suspect my use case is out of the ordinary and I'll have to live with UI slowness but is there anything I can try to speed its responsiveness?
Hi! Which annotation interface are you using? Are you using the
relations UI? If so, that can currently become slower with larger documents (the specifics also depend on your browser and machine). I definitely want to work on improving the performance here – the tricky part is that the interface is quite complex so we might need a better way to represent the tokens and spans (which technically all have to be able to be connected to any other token or span).
If you're using any other text-based interface like
ner_manual, there shouldn't be any performance differences, even if you're using huge texts, since they render plain text and rely on the browser's native highlighting. If you do end up having problems here, that'd definitely be interesting to investigate, because it might indicate that something else is going on here!
Interesting, glad I raised this then, I'm using
ner_manul experience a significant performance difference.
Here's an example of long text where labeling is delayed in the UI (note: I just slapped brackets and text around the text, might not be completely valid)
example-of-long-arse-text.jsonl (40.9 KB)
I am following up on this discussion thread because I encountered the same issue. I am also using the
ner.manual recipe. For long documents, the loading time of the document is extremely slow, and even the annotation of a single entity can take up to a few seconds. After cutting documents in blocks of 50k characters, the annotation is still infeasible (several seconds as well), so now I am cutting them in blocks of 5k characters, and in this way there are no issues.
Thanks a lot.
Thanks for the details, this is very helpful! I was able to reproduce the problem ith a large enough example. The good news is, the bottleneck is the browser – but the bad news is, the bottleneck is the browser Essentially, it comes down to how the tokens are being represented as individual containers in the DOM. Anyway, I think I have some ideas for a possible workaround that I want to test. I'll keep you updated!
awesome, thank you for the update. Eagerly awaiting a browser side fix.