Only part of the files loaded

I have a dataset of 100 images (saved locally) and I'm using a jsonl file to store the location of images and existed bounding boxes.
A single line in the file may looks like this:

{"image": "animals.png", "width": 2200, "height": 1700, "spans": [{"_id": "1", "points": [[55.0, 95.0], [187.0, 95.0], [187.0, 73.1], [55.0, 73.1]],  "label": "cat", "color": "#8011ee"},...]}

The command that I used looks like this:

prodigy image.manual animal_pictures.jsonl --loader jsonl --label cat dog

Everything works normally at the beginning, but after I finished labeling the first 80 images, it showed 'no tasks available' (I'm sure that this dataset is newly created and the other 20 images were not labeled before.)
I tried to end this session and restart it using the same command (same output dataset and same input jsonl file), it worked, but I still only got to label image 81-96 this time before it shows 'no tasks available'. Then no matter if I restart the session or reload the web page, I just couldn't load the last 4 images.
Could anyone tell me the potential reason for this error? Thank you!

Thanks for the report! How large are your image files? And are there differences in file size between them?

I wonder if this is related to the stream generator somehow "timing out" and returning an empty list, which would then cause Prodigy to think that no more tasks are available. There could also be some weird interaction with the new async serving (introduced in v1.9) that causes empty batches to be sent out if something (e.g. image loading) is blocking :thinking:

The images are around 200kB each. Below is a screeshot of examples.
image
Is there something I can try to avoid this kind of problems?

The images are not large, but there might be around 10-100 existed bounding boxes on each page, could this be related? Too many bounding boxes pre-labeled?

Hi @sissi !

I tried to replicate your environment and all the process, but I wasn't able to replicate the issue. :confused:

I used this script to copy (generate) 100 images, and a file animals.jsonl with a line per image, each image having 100 pre-defined labels.

import shutil
from pathlib import Path
import srsly

if __name__ == "__main__":
    dir_path = Path("./img")
    source_file = Path("./animal000.jpg")
    lines = []
    for i in range(1, 101):
        file_name = f"animal{i:03}.jpg"
        file_path: Path = dir_path / file_name
        shutil.copyfile(source_file, file_path)
        spans = []
        for span in range(100):
            spans.append(
                {
                    "_id": f"{i}_{span}",
                    "points": [
                        [55.0 + span, 95.0 + span],
                        [187.0 + span, 95.0 + span],
                        [187.0 + span, 73.1 + span],
                        [55.0 + span, 73.1 + span],
                    ],
                    "label": "cat",
                    "color": "#8011ee",
                }
            )
        lines.append(
            {"image": f"img/{file_name}", "width": 2200, "height": 1467, "spans": spans}
        )
    srsly.write_jsonl("animals.jsonl", lines=lines)

Then I ran it with:

$ prodigy image.manual animals8 animals.jsonl --loader jsonl --label cat,dog

Using the latest version of Prodigy.

I tried passing several times through the whole dataset (each time in a new Prodigy "dataset", like animals8). I tried with different combinations, adding annotations to each image, and also accepting or rejecting fast, to make sure that if there was something related to concurrency it would show up.

But I wasn't able to replicate your issue :confused: All the times I got all the images and finished at exactly 100.

Maybe you could try generating a fake/testing dataset like that with that script and see if it works correctly or not. If it works as expected with the fake/generated dataset, then it might be something related to your dataset or the configs used with it. If it has the same error, then your environment and mine would probably have something different... Try with the latest Prodigy version and a recent Python (e.g. 3.7). Also, check the logs, there could be an error somewhere that could give some hints about what's happening underneath.

Thank you for testing out! I'm using Python 3.7 and Prodigy 1.9.6. I tried with testing dataset before, and it seems that this issue is pretty random. Here's what I tried:

  1. Same input data, same commands (only difference was name of dataset), I was able to finish the 100 pages in the testing dataset.
  2. A new .jsonl with only those lines that didn't load in the first annotation, start a new annotation in a new dataset, I also finished the annotation for these lines.

I did change some of the configurations, all that I changed are as below:

 "custom_theme": {
   "cardMaxWidth": "1000px"
 },
"db_settings": {
    "sqlite": {
      "name": "prodigy.db",
      "path": "my_path"
    }
  },
"global_css": ".prodigy-title {font-size: 14px} .prodigy-content svg text {font-size:20px !important; opacity:0.6} .prodigy-content path {stroke-width:2 !important}",
"labels": ["cat","dog","pig","bird"...]

I wasn't able to replicate the issue too :sweat_smile: annotations went well after that time.

1 Like

Argh, I don't like it when bugs get shy and disappear without giving a chance to understand them... :sweat_smile: :bug:

But I'm glad it seems you solved the issue for now! :tada: