I have created a python recipe and everything works fine so far.
But now I want to customise the jsonl output to match our project requirements. We are annotating with image_manuel and text_input.
The desired json/jsonl output should look something like this:
Hi! That should be no problem You can use Prodigy's Database API to connect to the database, load your dataset and then perform any transformation you need for your specific use case.
You can see examples of the format produced by the image UIs here: https://prodi.gy/docs/api-interfaces#image For each bounding box or polygon that you drew on the image, the "spans" contains an entry with "points" (the [x, y] coordinates of the points) and a "label", the label you selected.
So the format in your example could be created like this:
filename: by default, task["meta"]["file"]
boxes: each entry in task["spans"] describes one box and its [x, y] pixel coordinates, and the logic here is pretty simple:
top: smallest y coordinate
left: smallest x coordinate
width: largest x coordinate minus smallest x coordinate
height: largest y coordinate minus smallest y coordinate
label: either the span's label or the text input? This depends on what your label values mean and how they map to the numbers
How you integrate the text input from the text_input block depends on what the input "means" and how you've configured it. By default, the result gets saved to user_input, so the value of task["user_input"] is whatever the user typed in.
You shouldn't have to change anything about the database – I think it makes more sense to have Prodigy store the underlying data in its original format. That also makes it easy to load the data back in for annotation. But once you're ready to use the annotations you've created, you can perform your postprocessing and get out the desired format you need – either by running it over the exported JSON, or by loading your data in Python straight from Prodigy.