Selecting multiple labels in image.manual


I thought this might be a simple task but I wasn't able to set this up nor find anything in the forum.

Basically, I want to select multiple labels for every image in my images dataset. For example, if I have an image that has people in it, I want to label their clothes. Now for every type of clothes (clothes class), I can select the color that is appearing (colors class). So, if an image shows a woman wearing a white shirt, I can select the label of the color (white) and then the label of the clothes type (shirt) in order to label that image.

The only thing I could do is create a combination of all clothes types and colors but that's of course not feasible at all.

Thank you for your support.

@ines would appreciate your help with this

Hi! So when you're annotating your images, is your goal to draw boxes around all items of cloting? Or just the people? And is your end goal to train a model?

In general, we'd probably recommend doing this in two passes and first focus on getting the right boxes defined – for example, PERSON or SHIRT (if you're selecting the items of clothing explicitly). If the boxes are wrong, all subsequent decisions won't really be usable, so it's helpful to get the basic annotation layer validated first. It also means you can already separate out the images you're not interested in (e.g. images without people).

In the next step, you can then ask about more specific attributes like colour etc. and stream in one object at a time. If you already have the bounding boxes, you can use tricks here to make the annotation more efficient. For instance, do all shirts first, then all pants, so the annotator can stay in the same mindset. Where those processes often get inefficient and more error-prone is when you're asking human annotators to perform too many decisions at once, (or the bounding box annotation gets held up because someone can't decide on whether a shirt is green, blue or turquoise, or you come across something like "the dress" :sweat_smile:). Color detection might even be something you can automate with decent accuracy these days, so you could even experiment with using a model in the loop for this.

Thank you @ines for your reply! Our goal is to draw boxes on all items of clothing and select their type and color. I like the idea of two passes that you explained. How do I exactly pass the selected objects (items of clothing) one by one in the second pass to choose their color? I was trying to do a custom recipe for image.manual that takes the resulting jsonl file of the first annotation pass and uses split_spans to pass the stream of objects alone. I'm not sure if this works for images as well, but this is the only thing I found so far in the documentation. Am I doing it wrong? If so, can you please guide me on how to do it correctly? Appreciate it!

Yes, that's pretty much what I had in mind: for each example annotated with bounding boxes, you create a new examples for each bounding box that only contains that one bounding box. In code, it would look something like this:

from prodigy.components.db import connect
import copy

db = connect()
dataset = db.get_dataset("your_dataset_here")
examples = [] 

for task in dataset:
    for span in eg.get("spans", []):  # create one example per bounding box
        eg = copy.deepcopy(task)  # deepcopy the example for each bounding box
        eg["spans"] = [span]  # add single bounding box

You could then use examples as the input stream and add a key "options" to each example with the multiple choice options to choose from (color, other attributes etc). You could also sort the list of examples before you send them out, for example, by span["label"], so you do all shirts first, then all pants, and so on.

It could also be cool to output some stats at this stage: for example, how many items of clothing are there, which types are common? Which types are super rare, and what could potentially be a mistake?

If you wanted to add more automation, you could also do that here when you create and sort the examples: the span gives you access to the bounding box information (x, y, width, height) and eg["image"] is typically the base64-encoded image data or the image URL. So using an image library, you could already try to guess the dominant color (e.g. like this) and group them together. So your annotators can go through lots of examples in a row, and all they have to think about is something like "are these pants blue? (and if not, what are they?)".