Custom db-out output format

I have created a python recipe and everything works fine so far.
But now I want to customise the jsonl output to match our project requirements. We are annotating with image_manuel and text_input.

The desired json/jsonl output should look something like this:

  "filename": "1.png",
  "boxes": [
    "height": 219.0,
    "label": 1.0,
    "left": 246.0,
    "top": 77.0,
    "width": 81.0
    "height": 219.0,
    "label": 9.0,
    "left": 323.0,
    "top": 81.0,
    "width": 96.0

So we want just the boxes, the labels and image name.

But I just can not find any suggestions or walktroughs on how to do so. I hope you can help me.

Thanks in advance

Hi! That should be no problem :slightly_smiling_face: You can use Prodigy's Database API to connect to the database, load your dataset and then perform any transformation you need for your specific use case.

You can see examples of the format produced by the image UIs here: For each bounding box or polygon that you drew on the image, the "spans" contains an entry with "points" (the [x, y] coordinates of the points) and a "label", the label you selected.

So the format in your example could be created like this:

  • filename: by default, task["meta"]["file"]
  • boxes: each entry in task["spans"] describes one box and its [x, y] pixel coordinates, and the logic here is pretty simple:
    • top: smallest y coordinate
    • left: smallest x coordinate
    • width: largest x coordinate minus smallest x coordinate
    • height: largest y coordinate minus smallest y coordinate
    • label: either the span's label or the text input? This depends on what your label values mean and how they map to the numbers

How you integrate the text input from the text_input block depends on what the input "means" and how you've configured it. By default, the result gets saved to user_input, so the value of task["user_input"] is whatever the user typed in.

Thank you,
that worked for me. I do not unterstand how to build my own database but at least i can edit and reformat the existing output.

Glad it worked! :slightly_smiling_face:

You shouldn't have to change anything about the database – I think it makes more sense to have Prodigy store the underlying data in its original format. That also makes it easy to load the data back in for annotation. But once you're ready to use the annotations you've created, you can perform your postprocessing and get out the desired format you need – either by running it over the exported JSON, or by loading your data in Python straight from Prodigy.