We have plenty of images to manually classify and we will use these human classified images to build an image classification model.
One potential issue is, when we review many many images and save the decisions / labels into database, we realized the database bloat with many of these base64 data, is there a way to NOT save image data into database, we will only need image name (i.e. filename).
Sure, that should be no problem. The reason Prodigy does this by default is to make sure that the annotations and data are always aligned and you don’t use any information or references to the original image.
The easiest and most flexible solution would probably be to format your input data to use absolute paths or URLs to your images and don’t use any of Prodigy’s image loaders or pre-processors in your recipe that convert the data to base64. If you have a lot of images, you might want to use something like an S3 bucket to make them available. Your data could then look something like this:
Alternatively, you can also write a workaround that lets you customise the data that’s stored in the database, and strip out the base64 data or replace it with a reference to the image. See the comments on this thread for details and examples.
Make sense now, I will build a filepath lists in my custom recipe and give a try. Thank you!
Btw, one quick note, also in case others come across this topic later on: Browsers tend to block local file paths for security reasons, so if you want to load images into Prodigy by their absolute local path, you can either use a browser extension or serve the directory on a different
localhost port. See this StackOverflow thread for more info.
Thank you Ines. Since I am running Prodigy on a remote server, and also my images were on that server as well.
Do you know if there is a way to configure the root dir of the web server in Prodigy, so that it can point to the image folder?
Nevermind, I have a nginx server that I can use to serve these images.
I think it’s a nice feature to be able to pass the image binary to the server. On my setup I have a remote machine that I do most of the heavy dev on and I like to run prodigy on there and access from my laptop’s browser over LAN/wifi. If I don’t want the prodigy db to get bloated with the image binary blobs, I also need to bring up another file server. It’s certainly a hassle, I love that prodigy works all by itself. I think it’s not unreasonable to take a short hash of the image and save that. In that case, things can always stay consistent even if you mix up your paths, albeit recovery is a bit of work. But this keeps the DB lean which has lots of benefits.
Agree. Except when I use BASE64 encoded images, I had to wait a few seconds when clicked on ‘Save’ button, this is much faster if the images were served via a separate file server. So pros and cons, it is flexible for us to choose.