Hi,
I use text categorization recipe and saving annotation task which has long text (more than 64KB) causes a problem.
The annotation when saved is truncated and appropriate log message is displayed:
/home/user_name/.virtualenvs/my_virtual_env/lib/python3.6/site-packages/pymysql/cursors.py:170: Warning: (1265, "Data truncated for column 'content' at row 1")
The truncation causes damage in JSon format of the annotation task.
As I understand the reason for the truncation is because the column content
has Blob
type with limit of 64K.
I have problem in two following cases:
Case 1:
When I call get_dataset()
function of Database
class defined in db.py
module. As a result I cannot get annotated tasks.
Case 2:
When I restart prodigy with same parameters as in the previous run. As a result Prodigy cannot be restarted.
In these two cases I get the following exception:
File "/home/user_name/.virtualenvs/my_virtual_env/lib/python3.6/site-packages/prodigy/components/db.py", line 297, in get_dataset
return [eg.load() for eg in examples]
File "/home/user_name/.virtualenvs/my_virtual_env/lib/python3.6/site-packages/prodigy/components/db.py", line 297, in <listcomp>
return [eg.load() for eg in examples]
File "/home/user_name/.virtualenvs/my_virtual_env/lib/python3.6/site-packages/prodigy/components/db.py", line 99, in load
return srsly.json_loads(content)
File "/home/user_name/.virtualenvs/my_virtual_env/lib/python3.6/site-packages/srsly/_json_api.py", line 38, in json_loads
return ujson.loads(data)
ValueError: Unmatched ''"' when when decoding 'string'
How can it be solved? Thank you!