Labelling a set of images (classification)

Love this reply. Thank you for the detail. Was enough for me to get what I need working.

I thought writing up a custom recipe was going to be super fiddly / hard, but with the example overlap it wasn't actually too bad :slight_smile:

I'll probably write up the things I learnt as a blog post, if only for futureme to remember what I did.

The only thing I haven't been able to get working is the exclude_by parameter in the config. I set it to input instead of task. You can see in the following code that I've temporarily disabled the get_random_stream functionality in order to debug. I continue to get the same images that I've already annotated. And this despite (when I export the annotations to check) the hashes being identical for both input and task.

Any idea what I'm doing wrong there?

import prodigy
import random
from prodigy.components.loaders import Images

LABEL = "some_label"

def before_db(examples):
    for eg in examples:
        if eg["image"].startswith("data:") and "path" in eg:
            eg["image"] = eg["path"]
    return examples

@prodigy.recipe("classify-images")
def classify_images(dataset, source):
    # def get_random_stream():
    #     stream = Images(source)
    #     for eg in stream:
    #         if random.random() < 0.05:  # or whatever
    #             yield eg

    def get_stream():
        # stream = get_random_stream()
        stream = Images(source)
        for eg in stream:
            eg["label"] = LABEL
            yield eg

    return {
        "dataset": dataset,
        "stream": get_stream(),
        "view_id": "classification",
        "before_db": before_db,
        "config": {
            "choice_style": "single",
            "exclude_by": "input"
        }
    }