Classify images with binary-option working through multiple labels

Hello,

for the classification of a larger amount of images I would like to use the binary-label option.
Is there a way, that I can work through a list of different labels or select them in the web-interface (see draft), instead of running the annotation process for each label separate.

So after all images are classified with one tag, I can switch to another tag or it automatically moves on to the next tag.

Right now I start with a code like this for each label:
$ prodigy mark finger_lens_images ./images --loader images --label FINGER_OVER_LENS --view-id classification )

If possible I would like to avoid the multi-label option, because it is more difficult to check images for a variety of specific elements instead of staying focused on a specific marker for a larger amount of images.

Thank you and best regards
Matt

Yes, this sounds like a reasonable approach :+1: We'd typically recommend against having an option in the UI to change labels or other things about the annotation task, though, because you really want to be able to control what the annotator annotates. Otherwise, you can easily end up with inconsistent data.

If you want to loop over your data multiple times with different labels, you could easily do this in a custom recipe with a nested loop. Once all images are sent out with the first label, you repeat it with the second label, and so on:

def get_stream():
    for label in ["FIRST_LABEL", "SECOND_LABEL"]:
        examples = Images("/path/to/images")
        for eg in examples:
            eg["label"] = label
            yield eg

In theory, you could also invert the nested loop here and send out every example twice for both labels. But instinctively, I'd say that this might be worse and could more easily lead to mistakes because the annotator always has to think about both labels, and might be more likely to accidentally submit the same image twice with the same decision because they might not notice the label change at the top.

Thank you for your response. - Yes, it's a good point to let the annotators stay focused with one label and just switch if they are finished with one batch of images.

It works :slightly_smiling_face:
Here is my custom recipe, so everybody with the same task can use it right away.

import prodigy
from prodigy.components.loaders import Images
from prodigy.components.loaders import JSONL

@prodigy.recipe("image-classification-loop",
    dataset=("The dataset to save to", "positional", None, str),
    file_path=("Path to images", "positional", None, str),
)
def image_classification_loop(dataset, file_path):
    #blocks of the interface
    blocks = [{"view_id": "classification"}]

    def get_stream():
        for label in ["FIRST_LABEL", "SECOND_LABEL"]:
            examples = JSONL(file_path)
            for eg in examples:
                eg["label"] = label
                yield eg
            
    return {
        "dataset": dataset,
        "stream": get_stream(),
        "view_id": "blocks",
        "config": {"blocks": blocks}
    }

filename: image_classification.py

Start the recipe like this:
$ python prodigy image-classification-loop dataset_saved_to ./load_images_from.jsonl -F image_classification.py

Just one further question - is there an easy way to randomize the the order in which the images are shown? Perhaps randomize the order in which the JSONL is loaded, directly in the get_stream() function, ?

Best, Matt

Yay, thanks for sharing!

Yes, in that case, you could just load all examples into memory and shuffle them:

examples = list(JSONL(file_path))
random.shuffle(examples)

You could also do it once at the top of the function if you don't want to re-shuffle them for each label and are okay with them being sent out for each label in the same (random) order.

The only downside here is that it requires all examples to be loaded into memory once so if your have a lot of examples, this might add some startup time. If that's a problem, there are some approaches for fake-shuffling a generator in batches (see various examples on Stack Overflow), but I'm not sure if it's worth it in this case.

Thank you very much! That's great.

Just one last question, because I tried it now several times.
Since the order is now random, there is less of visual clue, that the next batch of labeling started. Is it possible to trigger the annotation instructions in the for loop, to give a hint: "label has changed"?

So far i found this post: Instructions pop over on load [Nov '19] - could i trigger this click in the for loop?
Or would there be another option to "print" some kind of additional information in the webinterface?

Best, Matt

That's a good point! I can think of a few ways to solve it and it kinda depends on your preference and how elegant you want things to be.

One (hacky) approach would be to just add an image or text task in between that says "Label is changing!" or something like that. You could mark it with a special key in the JSON and then exclude it afterwards when you export the data.

Another option would be to add some custom JavaScript that runs whenever a task is updated and checks whether the label has changed, compared to the previous label. If it has, you can then show an alert in the browser. Haven't tested this yet, but something like this in the "javascript" returned by your recipe "config" should work:

let prevLabel = null

document.addEventListener('prodigyupdate', event => {
    // This runs every time the task changes
    const label = window.prodigy.content.label
    if (label !== prevLabel) {  // label has changed 
        alert('Attention: The label has changed!')
    }
    prevLabel = label
})

Thank you so much for your help!
I already tried the JSON approach, which works well. I will also try the javascript way.

Best, Matt

1 Like