Hello Prodigy team,
We are investigating a very interesting saving error that happens on our Prodigy instances when working with very large text documents.
Error message:
Here is what our specs are:
Docker Image running in Kubernetes pod
Build with Python 3.10.11
Using Prodigy 1.12.7
DB: MySQL 8
Recipe: ner.manual with --highlight-chars
Here is what we did and observed:
We noticed that when a user gets to documents with very large sizes, above 18 000 characters the mentioned error pops up.
To replicate that error we created a dataset with documents having 18 000+ characters.
Then we checked the logs using PRODIGY_LOGGING=basic/PRODIGY_LOGGING=verbose and we found the following when the first large document loads.
...32800, 'end': 32801, 'id': 27860, 'ws': False}, {'text': 'g', 'start': 32801, 'end': 32802, 'id': 27861, 'ws': False}], '_view_id': 'ner_manual'}], 'total': 0, 'progress': 0.357, 'session_id': '2023-11-15_13-30-46'}
INFO: 100.99.30.69:37200 - "POST /get_session_questions HTTP/1.0" 200 OK
The user then selects any answer in the Prodigy UI and tries to save the document, then the error appears.
No information in the logs is shown for what can be the cause, it just sits on 200 OK and the UI cannot save any documents past the one that caused the issue.
If another user session is created the work continues untill another large document is found and so on.
Its interesting that we made a test on a local machine using Python environment with Prodigy client of the same version (no docker, no kuberntes) and we could not reproduce the same issue as the documents were saved properly without any errors into the MySQL DB.
Do you have any idea from your end what can be the reason for this issue?