Hi,
I am currently in the process of developing a custom prodigy recipe to label my own images within a specific folder. My approach is largely based on the example outlined in the "Assigning multiple labels to images " section of "Computer Vision · Prodigy · An annotation tool for AI, Machine Learning & NLP."
I have made several modifications to this example recipe such as adding a "before_db" function that eliminates the base-64 representation of the image from the output jsonl, substituting it with the corresponding image file path instead. As well as this, I have changed the "classify_images" function to allow users to choose what labels they want to use through the command line. The command I would use to begin annotating would look something like:
$ python -m prodigy classify-images image_dataset ./images --label "A,B,C" -F image_multilabel_recipe.py
While the recipe appears to be working, I have encountered an issue during the image labelling process. I have noticed that certain images randomly appear multiple times, meaning that the same image will be annotated more than once. For example, if I only have 100 images in a folder I will end up with something like 130 annotations because some images have randomly appeared more than once.
I am uncertain about the root cause of this issue. It is worth noting that my image folder consists of a combination of PNG and JPEG files, and I'll attach my custom recipe file for your reference.
Any help or suggestions would be much appreciated! ![]()
import prodigy
from prodigy.components.loaders import Images
from prodigy.util import split_string
from typing import List
@prodigy.recipe("classify-images", label=("Comma-separated label(s)", "option", "l", split_string))
def classify_images(dataset, source, label: List[str]):
OPTIONS = []
number = 0
for category in label:
OPTIONS.append({"id": number, "text": category})
number += 1
# OPTIONS=label
def get_stream():
# Load the directory of images and add options to each task
stream = Images(source)
for eg in stream:
eg["options"] = OPTIONS
# eg = eg["path", "options", "accept", "answer"]
yield eg
def before_db(examples):
for eg in examples:
# If the image is a base64 string and the path to the original file
# is present in the task, remove the image data
if eg["image"].startswith("data:") and "path" in eg:
eg["image"] = eg["path"]
return examples
return {
"before_db": before_db,
"dataset": dataset,
"stream": get_stream(),
"view_id": "choice",
"config": {
"choice_style": "multiple", # or "single"
# Automatically accept and submit the answer if an option is
# selected (only available for single-choice tasks)
"choice_auto_accept": False
}
}
