Custom db-out output format

ole.voss · April 24, 2020, 8:37am

I have created a python recipe and everything works fine so far.
But now I want to customise the jsonl output to match our project requirements. We are annotating with image_manuel and text_input.

The desired json/jsonl output should look something like this:

{  
  "filename": "1.png",
  "boxes": [
   {
    "height": 219.0,
    "label": 1.0,
    "left": 246.0,
    "top": 77.0,
    "width": 81.0
   },
   {
    "height": 219.0,
    "label": 9.0,
    "left": 323.0,
    "top": 81.0,
    "width": 96.0
   }
  ]
 }

So we want just the boxes, the labels and image name.

But I just can not find any suggestions or walktroughs on how to do so. I hope you can help me.

Thanks in advance

ines · April 24, 2020, 10:35am

Hi! That should be no problem You can use Prodigy's Database API to connect to the database, load your dataset and then perform any transformation you need for your specific use case.

You can see examples of the format produced by the image UIs here: https://prodi.gy/docs/api-interfaces#image For each bounding box or polygon that you drew on the image, the "spans" contains an entry with "points" (the [x, y] coordinates of the points) and a "label", the label you selected.

So the format in your example could be created like this:

filename: by default, task["meta"]["file"]
boxes: each entry in task["spans"] describes one box and its [x, y] pixel coordinates, and the logic here is pretty simple:
- top: smallest y coordinate
- left: smallest x coordinate
- width: largest x coordinate minus smallest x coordinate
- height: largest y coordinate minus smallest y coordinate
- label: either the span's label or the text input? This depends on what your label values mean and how they map to the numbers

How you integrate the text input from the text_input block depends on what the input "means" and how you've configured it. By default, the result gets saved to user_input, so the value of task["user_input"] is whatever the user typed in.

ole.voss · April 27, 2020, 8:53am

Thank you,
that worked for me. I do not unterstand how to build my own database but at least i can edit and reformat the existing output.

ines · April 27, 2020, 9:48am

Glad it worked!

You shouldn't have to change anything about the database – I think it makes more sense to have Prodigy store the underlying data in its original format. That also makes it easy to load the data back in for annotation. But once you're ready to use the annotations you've created, you can perform your postprocessing and get out the desired format you need – either by running it over the exported JSON, or by loading your data in Python straight from Prodigy.

Topic		Replies	Views
Customize the JSON format when saving the annotations in database? database , solved	3	2042	May 24, 2018
Need to create a jsonl file on python according to certain format usage , third-party	1	810	October 2, 2019
Custom JSONL output usage , solved	6	1266	March 13, 2020
LABELS showing as TXT in DB-Output JSONL && PDF-Prodigy Approach ner , install , custom	1	157	May 25, 2024
text_input output empty string done , front-end , solved	2	467	April 2, 2020

Custom db-out output format

Related topics