Using image.manual to correct bounding box annotations

Hello,

I want to use Prodigy to correct annotation of a vision ml algorithm.
I currently have a json file that is constructed almost like what's indicated in the documentation. The only difference is that the key of each bracket correspond to the frame of the video.

{
    "0": [
        {
            "x": 309,
            "y": 380,
            "width": 73,
            "height": 37,
            "points": [
                [
                    309,
                    380
                ],
                [
                    309,
                    453
                ],
                [
                    346,
                    380
                ],
                [
                    346,
                    453
                ]
            ],
            "center": [
                327.5,
                416.5
            ],
            "label": 3
        },
        {
            [...]
        }
    ],
    "1": [
        [...]
    ]
}

I have every frame of that video exported in a folder (with their name being "[FrameNumber].png").
I want to use image.manual to review the annotations and correct them.

I face two different problems:

  1. How to format the json file in order to import it in the prodigy database ?

Prodigy doesn't seems to like the way I use dictionnary to link the frame number to its annotation.
Instead of using a dictionnary, I'm gonna be forced to stock every image (through its base 64 encoded image data) directly in that json imported file ? Isn't there any way to link it directly for a local file ?

  1. How to export the database without the base64 image associated to each annotation ?

When I try to use the db-out command on some annotation, it gives me the annotation in the correct format but also the base 64 image data in that file.
Is there any way to only export annotation data from the database and not the image data ?

Thanks,

Best,
Gautier

You don't need to import anything upfront to annotate it with Prodigy – you can also write your own custom recipe that takes the input data in your format, and generates annotation examples from them. Or you can generate them as a preprocessing step, that's up to you :slightly_smiling_face:

The only thing that's important is that the data that gets sent out by your stream follows the expected JSON format. You can find an example of the data format here: Annotation interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP Each example should have a key "image" and a key "spans" containing a list of bounding boxes, just like the ones you already have in your data.

You don't have to use base64 – you can also set "image" to a URL (e.g. hosted in an S3 bucket or a local web server). The problem with local file paths is mainly that modern browsers typically block them for security reasons – so you either want to send the image data with each task (works well for smaller images) or serve them somewhere. Prodigy also comes with an Images and ImageServer loader that helps you load/serve files from a directory: https://prodi.gy/docs/api-loaders#loaders-file

The --remove-base64 flag on the built-in image.manual recipe will remove the base64-encoded data before the examples are placed in the database. See here: Built-in Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

In a custom recipe, you can use the before_db callback to implement this (or make any other modifications to the JSON data before saving it): Custom Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

Thank you a lot for your detailed answers ! It now works perfectly. :slightly_smiling_face:

Regards,

Gautier

1 Like