multilabel image classification on a bounding box

I'm doing a multilabel classification. The process is a person highlights a part of the image with a bounding box. Then picks 2-3 labels that fit the box. For example, in action recognition, a person can be standing and pointing. Or a person can be running and bending. Or in fashion for fine grained classification, an object can have multiple attributes.

Can you point me to a recipe or give me an outline of how to make one that works for this?

Hi! A similar(ish) question came up in this thread the other day, and for cases like this, we'd typically recommend making two passes over the annotations for the two objectives: getting the bounding box right and categorising the content labelled by the bounding box:

In terms of implementation, the logic in your recipe could look something like this:

options = [
    {"id": "STAND", "text": "🧍 standing"}, 
    {"id": "POINT", "text": "👉 pointing"}
]

def get_box_classification_stream(stream):
    for eg in stream:
        for span in eg.get("spans", []):  # the bounding boxes
            eg = copy.deepcopy(eg)
            eg["spans"] = [span]
            eg["options"] = options
            eg = prodigy.set_hashes(eg)
            yield eg

Another advantage of this approach is that it also makes it easy to slowly start automating parts of the process – for instance, recognising the actual person (i.e. drawing the box) might be something your model is able to do quite accurately pretty quickly, so you can mix in suggestions from the model and only focus on the classification of each box

I missed that question. It is very similar to my use case.

And thank you for the recipe. It is very helpful as I'm still figuring my way around prodigy and writing custom recipes.