Hi,
I am currently in the process of developing a custom prodigy recipe to label my own images within a specific folder. My approach is largely based on the example outlined in the "Assigning multiple labels to images " section of "Computer Vision · Prodigy · An annotation tool for AI, Machine Learning & NLP."
I have made several modifications to this example recipe such as adding a "before_db" function that eliminates the base-64 representation of the image from the output jsonl, substituting it with the corresponding image file path instead. As well as this, I have changed the "classify_images" function to allow users to choose what labels they want to use through the command line. The command I would use to begin annotating would look something like:
$ python -m prodigy classify-images image_dataset ./images --label "A,B,C" -F image_multilabel_recipe.py
While the recipe appears to be working, I have encountered an issue during the image labelling process. I have noticed that certain images randomly appear multiple times, meaning that the same image will be annotated more than once. For example, if I only have 100 images in a folder I will end up with something like 130 annotations because some images have randomly appeared more than once.
I am uncertain about the root cause of this issue. It is worth noting that my image folder consists of a combination of PNG and JPEG files, and I'll attach my custom recipe file for your reference.
Any help or suggestions would be much appreciated!
import prodigy
from prodigy.components.loaders import Images
from prodigy.util import split_string
from typing import List
@prodigy.recipe("classify-images", label=("Comma-separated label(s)", "option", "l", split_string))
def classify_images(dataset, source, label: List[str]):
OPTIONS = []
number = 0
for category in label:
OPTIONS.append({"id": number, "text": category})
number += 1
# OPTIONS=label
def get_stream():
# Load the directory of images and add options to each task
stream = Images(source)
for eg in stream:
eg["options"] = OPTIONS
# eg = eg["path", "options", "accept", "answer"]
yield eg
def before_db(examples):
for eg in examples:
# If the image is a base64 string and the path to the original file
# is present in the task, remove the image data
if eg["image"].startswith("data:") and "path" in eg:
eg["image"] = eg["path"]
return examples
return {
"before_db": before_db,
"dataset": dataset,
"stream": get_stream(),
"view_id": "choice",
"config": {
"choice_style": "multiple", # or "single"
# Automatically accept and submit the answer if an option is
# selected (only available for single-choice tasks)
"choice_auto_accept": False
}
}