image.manual returns ValueError: Unmatched ''"' when when decoding 'string'

This has been happening with me quite often, especially after prodigy update (v1.8.3). It's corrupting the whole dataset of annotation over and over, again. I can't even export the result now.

I run this (below) and save the annotations:

prodigy image.manual my_dataset path/to/image --label ONE,TWO,THREE

When I try to access the dataset again to add more annotations or to export the jsonl file or even if I try to check the stats, it returns:

14:59:17 - APP: Using Hug endpoints (deprecated)
14:59:17 - DB: Initialising database MySQL
/home/shiftu/.local/lib/python3.6/site-packages/pymysql/cursors.py:170: Warning: (3090, "Changing sql mode 'NO_AUTO_CREATE_USER' is deprecated. It will be removed in a future release.")
  result = self._query(query)
14:59:17 - DB: Connecting to database MySQL
14:59:17 - DB: Loading dataset 'my_dataset' (290 examples)
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/shiftu/.local/lib/python3.6/site-packages/prodigy/__main__.py", line 372, in <module>
    plac.call(commands[command], arglist=args, eager=False)
  File "/home/shiftu/.local/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/shiftu/.local/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/shiftu/.local/lib/python3.6/site-packages/prodigy/__main__.py", line 263, in db_out
    examples = DB.get_dataset(set_id)
  File "/home/shiftu/.local/lib/python3.6/site-packages/prodigy/components/db.py", line 296, in get_dataset
    return [eg.load() for eg in examples]
  File "/home/shiftu/.local/lib/python3.6/site-packages/prodigy/components/db.py", line 296, in <listcomp>
    return [eg.load() for eg in examples]
  File "/home/shiftu/.local/lib/python3.6/site-packages/prodigy/components/db.py", line 99, in load
    return srsly.json_loads(content)
  File "/home/shiftu/.local/lib/python3.6/site-packages/srsly/_json_api.py", line 37, in json_loads
    return ujson.loads(data)
ValueError: Unmatched ''"' when when decoding 'string'

Using:
prodigy==1.8.3
python==3.6.8
ujson==1.35
srsly==0.0.7

I want to debug this. Can you help me? Thanks!

Hi! Thanks for the report – this is definitely strange. It seems like the base64 string representing the image gets corrupted somehow.

How big are your images? And when you look at what's in your MySQL database, can you see any JSON blobs that are cut off? And if so, where? If the images are very large, one possible explanation is that the JSON gets truncated when it's saved to the DB.

If your images are large and you can't easily change that, one solution would be to not rely on encoding the entire image as as string and instead load the images via URLs (and maybe keep an additional reference to the image ID so you can always relate the annotations back). Your input data could then be a JSONL file and you could specify --loader jsonl in image.manual. For example:

{"image": "https://example.com/image1.jpg", "id": 123}
{"image": "https://example.com/image2.jpg", "id": 456}

One thing to note: Using local file paths for the images isn't going to work, since modern browsers typically block those for security reasons (see here for details). So you'd either have to start a simple local server to host the directory of images, or upload them somewhere (like an S3 bucket).

Yes. Images were 10MB+ big. I have reduced the image pixels, and it works now.

THIS IS AMAZING!!! I'll be considering this approach for my next requirement, for sure. I'm sure the processes will be faster in this scenario.

Noted. Thank you, Ines. :slight_smile:

1 Like