hi @c00lcoder!
Thanks for the clarification.
The before_db
callback could help with the 2nd step to modify the data.
Like described previously, it would need to be a two-step process. First you create your bounding boxes, then add the attributes to the spans by iterating over each span separately and adding the choices with the before_db method to the spans.
It's important to know that before_db
callback should be used sparingly and with caution. The docs mention this:
The
before_db
callback modifies the annotations and Prodigy will place whatever it returns in the database. You should therefore use it cautiously , since a small bug in your code could lead to data loss.
Step 1: Get person bounding boxes
Let's say you start with a .jsonl
file that has the path to your images:
# image-sample.jsonl
{"image": "person-image.png"}
You first create the bounding boxes for the person by running:
python -m prodigy image.manual image-sample image-sample.jsonl --loader jsonl --label PERSON
Save your annotations so now your bounding box person annotations are in the Prodigy dataset image-sample
.
Step 2: Custom recipe for additional attributes
Here's a custom recipe to provide four choice attributes for each person (e.g., female/male, brown/black hair).
# image-nesting.py
import prodigy
from prodigy.components.db import Database
from prodigy.components.db import connect
from prodigy import set_hashes
@prodigy.recipe(
"image.nesting",
dataset=("The dataset to use", "positional", None, str),
origin_dataset=("The original dataset to get the images from", "positional", None, str),
)
def image_nesting(dataset: str, origin_dataset):
db = connect()
data = db.get_dataset(name=origin_dataset)
def get_stream():
for image in data:
if "spans" in image:
for span in image["spans"]:
image_copy = image.copy()
image_copy["spans"] = [span]
image_copy["options"] = [
{"id": "female", "text": "female"},
{"id": "male", "text": "male"},
{"id": "brown", "text": "brown hair"},
{"id": "black", "text": "black hair"},
]
image_copy["id"] = span["id"]
yield set_hashes(image_copy, task_keys=("spans", "image", "id"), input_keys=("spans", "image", "id"), overwrite=True)
stream = get_stream()
def before_db(examples):
for eg in examples:
if "spans" in eg and "accept" in eg:
eg["spans"][0]["additional_info"] = eg["accept"]
return examples
return {
"dataset": dataset,
"stream": stream,
"view_id": "choice",
"config": {
"choice_style": "multiple"
},
"before_db": before_db,
}
You can then run this recipe as:
python -m prodigy image.nesting image-nesting image-sample -F image-nesting.py
We can then look at your new annotation:
python3 -m prodigy db-out image-nesting > image-nesting.jsonl
#image-nesting.jsonl
{
"image": "data:image/png;base64, ...", # removed actual base64 for example
"_input_hash": 1875068863,
"_task_hash": -354213639,
"_is_binary": false,
"path": "person-image.png",
"_view_id": "choice",
"width": 400,
"height": 267,
"spans": [
{
"id": "5b63d3f8-e4c8-4160-8a7f-d6579ce210ad",
"label": "PERSON",
"color": "yellow",
"x": 56.3,
"y": 9,
"height": 247,
"width": 178,
"center": [
145.3,
132.5
],
"type": "rect",
"points": [
[
56.3,
9
],
[
56.3,
256
],
[
234.3,
256
],
[
234.3,
9
]
],
"additional_info": [
"female",
"brown"
]
}
],
"answer": "accept",
"_timestamp": 1672848904,
"options": [
{
"id": "female",
"text": "female"
},
{
"id": "male",
"text": "male"
},
{
"id": "brown",
"text": "brown hair"
},
{
"id": "black",
"text": "black hair"
}
],
"id": "5b63d3f8-e4c8-4160-8a7f-d6579ce210ad",
"config": {
"choice_style": "multiple"
},
"accept": [
"female",
"brown"
]
}
Does this solve your problem?