Hi! A similar(ish) question came up in this thread the other day, and for cases like this, we'd typically recommend making two passes over the annotations for the two objectives: getting the bounding box right and categorising the content labelled by the bounding box:
In terms of implementation, the logic in your recipe could look something like this:
options = [
{"id": "STAND", "text": "🧍 standing"},
{"id": "POINT", "text": "👉 pointing"}
]
def get_box_classification_stream(stream):
for eg in stream:
for span in eg.get("spans", []): # the bounding boxes
eg = copy.deepcopy(eg)
eg["spans"] = [span]
eg["options"] = options
eg = prodigy.set_hashes(eg)
yield eg
Another advantage of this approach is that it also makes it easy to slowly start automating parts of the process – for instance, recognising the actual person (i.e. drawing the box) might be something your model is able to do quite accurately pretty quickly, so you can mix in suggestions from the model and only focus on the classification of each box