Using Boto3 to stream photos from an s3 bucket

matthewvielkind · February 27, 2021, 7:30pm

I faced a similar problem and couldn't find a good example for a solution. This is the custom loader that ended up working for me:

import boto3
import prodigy
import json
from prodigy.util import img_to_b64_uri


@prodigy.recipe("stream-from-s3")
def stream_from_s3(bucket, prefix=None):
    # Get all loaded images.
    s3 = boto3.client('s3')

    # Build a paginator for when there are a lot of objects.
    paginator = s3.get_paginator('list_objects')
    paginate_params = {
        'Bucket': bucket
    }

    # Check if only certain images from S3 should be loaded.
    if prefix is not None:
        paginate_params['Prefix'] = prefix

    page_iterator = paginator.paginate(**paginate_params)

    # Iterate through the pages.
    for page in page_iterator:
        # Iterate through items on the page.
        for obj in page['Contents']:
            img_key = obj['Key']

            # Read the image.
            img = s3.get_object(Bucket=bucket, Key=img_key).get('Body').read()

            # Provide response that Prodigy expects.
            print(json.dumps({'image': img_to_b64_uri(img, 'image/jpg')}))

You could then use the custom loader to pipe images to your annotator.

prodigy stream_from_s3 BUCKET PREFIX -F s3_loader.py | prodigy mark DATASET - --label LABEL --view-id classification

I put the code in this GitHub repo. Happy to hear any suggestions for how this could be improved!

Topic		Replies	Views
Multiple s3 bucket to stream in a custom recipe usage , image , custom , streams	5	322	November 22, 2023
Classification interface for images usage , image	2	1089	February 6, 2019
Getting "No tasks available" message, all images are not getting loaded to prodigy! usage , image , streams	2	486	November 26, 2021
Dataset from mysql instead json in 2022 usage , solved , streams	4	554	January 16, 2022
Stream videos from AWS S3 using boto3 with audio wav form displaying correctly usage	3	94	August 31, 2024

Using Boto3 to stream photos from an s3 bucket

Related topics