Clarification on annotation capabilities

hi @c00lcoder!

Thanks for the clarification.

The before_db callback could help with the 2nd step to modify the data.

Like described previously, it would need to be a two-step process. First you create your bounding boxes, then add the attributes to the spans by iterating over each span separately and adding the choices with the before_db method to the spans.

It's important to know that before_db callback should be used sparingly and with caution. The docs mention this:

The before_db callback modifies the annotations and Prodigy will place whatever it returns in the database. You should therefore use it cautiously , since a small bug in your code could lead to data loss.

Step 1: Get person bounding boxes

Let's say you start with a .jsonl file that has the path to your images:

# image-sample.jsonl
{"image": "person-image.png"}

You first create the bounding boxes for the person by running:

python -m prodigy image.manual image-sample image-sample.jsonl --loader jsonl --label PERSON

Save your annotations so now your bounding box person annotations are in the Prodigy dataset image-sample.

Step 2: Custom recipe for additional attributes

Here's a custom recipe to provide four choice attributes for each person (e.g., female/male, brown/black hair).

# image-nesting.py
import prodigy
from prodigy.components.db import Database
from prodigy.components.db import connect
from prodigy import set_hashes

@prodigy.recipe(
    "image.nesting",
    dataset=("The dataset to use", "positional", None, str),
    origin_dataset=("The original dataset to get the images from", "positional", None, str),
)
def image_nesting(dataset: str, origin_dataset):    
    db = connect()      
    data = db.get_dataset(name=origin_dataset)
    
    def get_stream():
        for image in data:
            if "spans" in image:
                for span in image["spans"]:
                    image_copy = image.copy()
                    
                    image_copy["spans"] = [span]
                    image_copy["options"] = [
                        {"id": "female", "text": "female"},
                        {"id": "male", "text": "male"},
                        {"id": "brown", "text": "brown hair"},
                        {"id": "black", "text": "black hair"},
                        ]
                    image_copy["id"] = span["id"]
                    yield set_hashes(image_copy, task_keys=("spans", "image", "id"), input_keys=("spans", "image", "id"), overwrite=True)
    
    stream = get_stream()
    
    def before_db(examples):
        for eg in examples:
           if "spans" in eg and "accept" in eg:
               eg["spans"][0]["additional_info"] = eg["accept"]
        return examples
        
    return {
        "dataset": dataset,
        "stream": stream,
        "view_id": "choice",
        "config": {
            "choice_style": "multiple"
        },
        "before_db": before_db,
    }

You can then run this recipe as:

python -m prodigy image.nesting image-nesting image-sample -F image-nesting.py

We can then look at your new annotation:

python3 -m prodigy db-out image-nesting > image-nesting.jsonl
#image-nesting.jsonl
{
  "image": "data:image/png;base64, ...", # removed actual base64 for example
  "_input_hash": 1875068863,
  "_task_hash": -354213639,
  "_is_binary": false,
  "path": "person-image.png",
  "_view_id": "choice",
  "width": 400,
  "height": 267,
  "spans": [
    {
      "id": "5b63d3f8-e4c8-4160-8a7f-d6579ce210ad",
      "label": "PERSON",
      "color": "yellow",
      "x": 56.3,
      "y": 9,
      "height": 247,
      "width": 178,
      "center": [
        145.3,
        132.5
      ],
      "type": "rect",
      "points": [
        [
          56.3,
          9
        ],
        [
          56.3,
          256
        ],
        [
          234.3,
          256
        ],
        [
          234.3,
          9
        ]
      ],
      "additional_info": [
        "female",
        "brown"
      ]
    }
  ],
  "answer": "accept",
  "_timestamp": 1672848904,
  "options": [
    {
      "id": "female",
      "text": "female"
    },
    {
      "id": "male",
      "text": "male"
    },
    {
      "id": "brown",
      "text": "brown hair"
    },
    {
      "id": "black",
      "text": "black hair"
    }
  ],
  "id": "5b63d3f8-e4c8-4160-8a7f-d6579ce210ad",
  "config": {
    "choice_style": "multiple"
  },
  "accept": [
    "female",
    "brown"
  ]
}

Does this solve your problem?