We don't want to save BASE64 encoded image data into database

haoxi911 · September 5, 2018, 7:55am

We have plenty of images to manually classify and we will use these human classified images to build an image classification model.

One potential issue is, when we review many many images and save the decisions / labels into database, we realized the database bloat with many of these base64 data, is there a way to NOT save image data into database, we will only need image name (i.e. filename).

ines · September 5, 2018, 9:09am

Sure, that should be no problem. The reason Prodigy does this by default is to make sure that the annotations and data are always aligned and you don’t use any information or references to the original image.

The easiest and most flexible solution would probably be to format your input data to use absolute paths or URLs to your images and don’t use any of Prodigy’s image loaders or pre-processors in your recipe that convert the data to base64. If you have a lot of images, you might want to use something like an S3 bucket to make them available. Your data could then look something like this:

{"image": "https://path-to-your-bucket/image.jpg"}

Alternatively, you can also write a workaround that lets you customise the data that’s stored in the database, and strip out the base64 data or replace it with a reference to the image. See the comments on this thread for details and examples.

haoxi911 · September 5, 2018, 9:23am

Make sense now, I will build a filepath lists in my custom recipe and give a try. Thank you!

ines · September 5, 2018, 9:35am

Sounds good!

Btw, one quick note, also in case others come across this topic later on: Browsers tend to block local file paths for security reasons, so if you want to load images into Prodigy by their absolute local path, you can either use a browser extension or serve the directory on a different localhost port. See this StackOverflow thread for more info.

haoxi911 · September 5, 2018, 9:53am

Thank you Ines. Since I am running Prodigy on a remote server, and also my images were on that server as well.

Do you know if there is a way to configure the root dir of the web server in Prodigy, so that it can point to the image folder?

haoxi911 · September 5, 2018, 10:23am

Nevermind, I have a nginx server that I can use to serve these images.

hadsed · September 25, 2018, 5:23pm

I think it’s a nice feature to be able to pass the image binary to the server. On my setup I have a remote machine that I do most of the heavy dev on and I like to run prodigy on there and access from my laptop’s browser over LAN/wifi. If I don’t want the prodigy db to get bloated with the image binary blobs, I also need to bring up another file server. It’s certainly a hassle, I love that prodigy works all by itself. I think it’s not unreasonable to take a short hash of the image and save that. In that case, things can always stay consistent even if you mix up your paths, albeit recovery is a bit of work. But this keeps the DB lean which has lots of benefits.

haoxi911 · October 1, 2018, 3:04am

Agree. Except when I use BASE64 encoded images, I had to wait a few seconds when clicked on ‘Save’ button, this is much faster if the images were served via a separate file server. So pros and cons, it is flexible for us to choose.

Topic		Replies	Views
unable to use binary classification for images using jsonl loader usage , image , solved	4	595	July 13, 2021
Classification interface for images usage , image	2	1087	February 6, 2019
Don't send back base64 images to backend image , front-end	2	978	January 22, 2021
How to export annotation of image manual without image string base64 usage , done , image , solved	20	2485	July 13, 2022
Labelling a set of images (classification) usage , image	7	2018	January 18, 2022

We don't want to save BASE64 encoded image data into database

Related topics