Data too long for column 'content'

Hello,

I managed to get Prodigy working with mysql-server:8.0 in a docker container. However, there was an issue that I’ve had to resolve manually and I believe could be fixed in some future release:

Prodigy creates a table called example. In this table, there is a column called content with blob as the data type which is used to store the content (for example an image if you are labelling images.) The problem is when the images are too big to be stored in which case peewee throws an error:

peewee.DataError: (1406, "Data too long for column 'content' at row 1")

The way I’ve worked around this, for now, is to connect to the database manually after the first time the database is set up and run the following:

alter table example modify content LONGBLOB;

This now allows larger images to be stored.

1 Like

Thanks for the report and the detailed analysis – I hadn’t seen this error before! We should be able to find a way to make it a longblob by default. Also good to see that the solution once the table is created is pretty straightforward.

Hi MajidD4t1qbit,

I saw you are doing some interesting work with docker containers.
I was just curious to know in what kind of use cases would adding prodigy in a docker be helpful(as I haven’t worked with docker).
I would be thankful if you would like to share some insight.

Thanks

Hi, @ines! Still have the same error. As I understand it is happening because max size of type blob in MySql is just 64Kb. It is really too small for images. While max size of type blob in SQLite is 2Gb (!).

Does changing it to a LONGBLOB as described above resolve it for you? That should (apparently) allow up to 4GB.

Alternatively, you could also choose to not store the images with the examples and use the new ImageServer loader instead (e.g. directly in a custom recipe or with --loader image-server in a built-in recipe). It does mean that you have to make sure that you never rename or delete the original image files – but if that's no problem, then it does give you more flexibility and means that your database will be much much smaller.