How to export annotation of image manual without image string base64

I am just new on using prodigy. I already run image.manual and annotate the image with bounding box. And then I exported the annotation using db-out. However, on the image key it shows the string base64 of the image. What do I need to do in order to get annotation with just file name of image without any base64 string at all?
Thank you very much.

Hi! What you get out at the end is always what you load in (which is kind of a core principle so you don't lose any data). If you don't want the images to be encoded, you can load in the image URLs from a JSONL file instead of the image directory.

I've posted an example of this in this thread the other day:

Thanks Ines.
I uploaded my images in AWS S3 bucket, and created jsonl file, aws-images.jsonl to contain something like this,
{"image":"", "id":1}
{"image":"", "id":2}

And then execute,
prodigy image.manual my_dataset aws-images.jsonl --loader jsonl --label REFRIGERATOR,SINK,DISHWASHER,BATHTUB,SHOWER,TOILET,BED

I still have image base64 string on annotation output file. Is there something I did wrong?


When you're exporting the data, are you looking at all the annotations in the dataset? The old annotations you've collected that include the image data will of course still be there – you've just changed the format of the new data you're reading in.

You can always convert the previous data and remove the "image" field and replace it with the filename (which should also be in the data) and then reupload it to a new dataset. Just make sure you keep the original files – if they change, you'll lose the reference to your annotations.

I created a new dataset and start from beginning.

Ah, sorry – I forgot that the built-in implementation always fetches the images by default. We should probably have an option that just lets you toggle this on the command line. You can just remove the following line from the recipe function: stream = fetch_images(stream).

When you look at the function, you'll see that it's actually really small and straightforward, so you could might also just want to write your own custom recipe.

Sorry still got problem:) . When I remove stream = fetch_images(stream), it throws error something like Name Error: name 'stream' is not defined.

And if we remove 'stream' from return {....'stream':...}, and execute the prodigy image.manual with the custom recipe, nothing happened, it did not show anything.


Yeah, you definitely shouldn't be removing the whole stream. The stream is the generator of examples that you're annotating. You can find more details on this in the documentation:

Maybe you removed too many lines? I'm not sure what you're editing, but for me, the recipe looks like this:

stream = get_stream(source, api=api, loader=loader, input_key="image")
stream = fetch_images(stream)

And I'm suggesting to remove the second line. If you do that, the variable stream will still be defined.

Hi, I have a similar question about how to prevent images from being stored as base64 strings in the database. I'm loading the images from a directory on the local machine. This is the recipe I'm using:

import prodigy
from prodigy.components.loaders import Images

def add_options(stream, labels):
    options = [{"id": label, "text": label.strip()} for label in
    for eg in stream:
        eg["options"] = options
        yield eg

def image_choice(dataset, source, labels):
    stream = Images(source)
    stream = add_options(stream, labels)
    return {
        "dataset": dataset,
        "stream": stream,
        "view_id": "choice",
        'config': {'choice_style': 'multiple'},
        'feed_overlap' : True,  

What should I change in order to stop the images themselves from being stored in the database?

Hi! If you're using Prodigy v1.9.4+, the easiest way would be to use the ImageServer loader instead of Images. This will serve your image directory using the Prodigy web server so Prodigy can refer to them by a URL (and doesn't have to include the actual image data).


Just released Prodigy v1.10, which introduces a new before_db recipe callback that lets you remove any base64 data before the examples are placed in the database. The image.manual recipe now also has a --remove-base64 flag that takes care of this automatically.

1 Like