I have been noticing our prodigy interface has become slower recently. We recently ran a few textcat sessions and there’s usually a delay on the interface whenever we hit accept and this delay seems to be proportional to the length of the texts in the dataset and increase as we label more examples for that set. This didn’t happen earlier- datasets of a size that seemed to not cause any noticeable lag a couple months go are now causing up to 15 seconds of lag every time we hit accept on an example.
We are using the textcat recipe
We currently have 121 datasets in the prodigy db, some for ner, most for textcat.
The texts are about 600 words on average
The actual size of the prodigy sqlite db is just 412 MB
Any ideas on what might be happening and what we can do to fix this?
Thanks for your message and welcome to the community
Have you tried or can you try logging? If so, can you tell which steps are taking most of the time?
You can try by prefixing your commands:
PRODIGY_LOGGING=basic python -m prodigy ...
You can also use verbose instead of basic to get even more details.
What recipes are you running? It seems like the issue is with annotation, not training. Any problems with training? Also, are you using any active learning?
Also, some recipes (e.g., relations) do have some issues with larger text (500+ words or more). I do think in general shorter text is better. Here's one set of tips to split up your tasks:
Sorry, I think I found the problem and it turned out to be an issue on my end. Every step being logged seems to be pretty quick, but more time is being eaten up on the front end whenever I move to the next example. I'm using a custom js for textcat that highlights some key words on the frontend to make it a bit easier. I can confirm removing the words to highlights reduces the lag.
What really is odd is that when we started using the custom js, there was no lag and it slowly started increasing with no change to the code or keywords, it worse when we had multiple people doing the labeling, and improved a bit when we just restarted the server we were running it on. So I assumed it was a backend issue. I think I'll have to look more carefully at this, but any idea why these problems might arise just from some custom frontend js code?
I'm glad to hear that you're getting a little closer to diagnosing the problem.
That's a bit challenging without seeing the full code and/or logs. You're welcome to provide more snippets of either, but I understand if you don't want to publicly share them. Especially a small reproducible example of your javascript would be the most important.
Two high level questions:
What version of prodigy are you running? (e.g., run prodigy stats)
How are you setting up multiple annotators and/or are they annotating concurrently on the same task? Typically, there are two options for multi-user sessions: unique session ids or unique port/datasets. Here's a bit of a background on both:
Since it's sounding like a front-end/javascript issue, I don't suspect it is a feed issue. But since you notice more issues when multiple users are annotating, I'm not ruling it out.
Feel free to add any additional things you've learned or things you think may be relevant.