Getting "No tasks available" message, all images are not getting loaded to prodigy!

Hey, I had passed around 200 images loaded from s3 to prodigy to annotate...and while annotating I had received about 72 images to annotate and didn't receive any images further, and a message of "No tasks available" was mentioned .tried refreshing the page..loading in incognito..didn't workout.
Note : This doesn't happens every time, but is happening frequently now.

Sharing my recipe code and command I use with subprocess call in python:
1. The main recipe code where I load images from s3:
#Code starts here

@prodigy.recipe("stream-from-s3")
def stream_from_s3(model_path, bucket, labels_string,labels_color_dict, prefix=None):
    labels_color_dict = eval(labels_color_dict)
    # Get all loaded images.
    s3 = boto3.client('s3')
    #s3 = boto3.resource('s3', region_name='us-east-1')
    label_list_to_predict = [x.strip() for x in labels_string.split(',')]
    model_setter(model_path)


    # Build a paginator for when there are a lot of objects.
    paginator = s3.get_paginator('list_objects')
    paginate_params = {
        'Bucket': bucket
    }

    # Check if only certain images from S3 should be loaded.
    if prefix is not None:
        paginate_params['Prefix'] = prefix

    page_iterator = paginator.paginate(**paginate_params)

    # Iterate through the pages.
    image_counter=0
    for page in page_iterator:
        # Iterate through items on the page.
        for obj in page['Contents']:
            img_key = obj['Key']
            
            prodigy.log(f"Processing key: {img_key}")
            if not img_key.endswith('.jpg'):
                continue
            
            unique_img_id = get_unique_img_id(image_counter)
            image_counter+=1

            # Read the image.
            _img_bytes = s3.get_object(Bucket=bucket, Key=img_key).get('Body').read()

            #img = Image.open(BytesIO(img))
            img = Image.open(BytesIO(_img_bytes))
            img = np.array(img)
            spans = []
            if(len(model_path)!=0):                
                spans = do_predict(img, label_list_to_predict, labels_color_dict)

            # Provide response that Prodigy expects.
            print(json.dumps({
                    'id':unique_img_id, 
                    'key': img_key,
                    'image': img_to_b64_uri(_img_bytes, 'image/jpg'),
                    'spans' : spans
                }))

2. The command we are using to start prodigy
prodigy stream_from_s3 myapp-dev "headmask_present, headmask_absent" "{'headmask_present': (0.16, 0.0, 1.0), 'headmask_absent': (0.5481762597512125, 1.0, 0.0)}" s3bucket/ -F loader.py | prodigy image.manual dataset-001 - --loader jsonl --label "headmask_present, headmask_absent" --no-fetch --remove-base64

Hi! What batch size are you using and how fast/slow is what you're doing in do_predict? Is there ever a scenario where your bucket may return an empty list? Alternatively, if you're using a model to make the predictions, does it ever try and start multiple threads (e.g. via PyTorch)? If so, this could potentially cause a problem here.

What batch size are you using and how fast/slow is what you're doing in do_predict?
The batch is the default by S3 paginator. do_predict takes around 1sec

Is there ever a scenario where your bucket may return an empty list?
Most of the time we saw this issue, there were around 200+ images in the bucket and we got "No tasks available" after around 60-70 images

multiple threads
No, right now its all single thread

If so, this could potentially cause a problem here.
Even we suspected that this might be the issue, but we did not see any error on in the logs/stderr.

To better understand the root cause, we would like to understand the following:

  1. How does the prodigy image.manual recipe decide that there are no more tasks available?
  2. How does prodigy recipe ask for more images over the pipe? Does the line "for page in page_iterator:", block until the "image.manual" ask for more?
  3. If this error happens again in future, is there a way to "restart" annotations from the last point? We know the dataset id, can we somehow use it to restart annotations? How will the "stream-from-s3" recipe know from where to resume?

Appreciate your quick response!
P.S: Is there a way to show the total count of S3 bucket images on the prodigy ui somehow?