How do you specify --remove-base64 when using custom recipes?

I tried adding '--remove-base64' to a custom recipe

prodigy custom_image_categorization productivity_data_lite input.jsonl --label "good, bad, ugly, unknown" --remove-base64 -F user_activity_categorization.py

and was met with the following error:

unrecognized arguments: --remove-base64

This is due to the fact that the flag is only implemented for certain image based recipes, not inherited to all. If you'd like the same functionality for a custom recipe, you could implement it by adding a '--remove-base64'

@prodigy.recipe("custom_image_categorization",
    dataset=("The dataset to save to", "positional", None, str),
    jsonl_path=("Path to the JSONL file", "positional", None, str),
    labels=("Comma-separated list of labels", "option", "l", split_string)
    ### Add here
    remove-base64=("Indicates whether images should be stored to db", "option", False, bool)
)
def custom_image_categorization(dataset: str, jsonl_path: str, labels: Optional[List[str]]):
    "....."
    # See docs
    def before_db(examples):
        if remove_base64: 
          for eg in examples:
              # If the image is a base64 string and the path to the original file
              # is present in the task, remove the image data
              if eg["image"].startswith("data:") and "path" in eg:
                  eg["image"] = eg["path"]
        return examples
    "....."

Hi @MLOops !

As you are using a custom recipe, you will need to provide the implementation of both the CLI flag and the helper function. You can copy it from the recipe source code. i.e.

import prodigy
from prodigy.components.loaders import Images
from prodigy.util import split_string
from prodigy.types import TaskType
from typing import List, Optional


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
    "custom_image.manual",
    dataset=("The dataset to use", "positional", None, str),
    source=("Path to a directory of images", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string),
    exclude=("Names of datasets to exclude", "option", "e", split_string),
    darken=("Darken image to make boxes stand out more", "flag", "D", bool),
    remove_base64=("Remove base64-encoded images before saving", "flag", "B", bool)
)
def image_manual(
    dataset: str,
    source: str,
    label: Optional[List[str]] = None,
    exclude: Optional[List[str]] = None,
    darken: bool = False,
    remove_base64: bool = False
):
    """
    Manually annotate images by drawing rectangular bounding boxes or polygon
    shapes on the image.
    """
    # Load a stream of images from a directory and return a generator that
    # yields a dictionary for each example in the data. All images are
    # converted to base64-encoded data URIs.
    stream = Images(source)
    
    def before_db(examples: List[TaskType]) -> List[TaskType]:
        # Remove all data URIs before storing example in the database
        for eg in examples:
            if eg["image"].startswith("data:"):
                eg["image"] = eg.get("path")
        return examples

    return {
        "view_id": "image_manual",  # Annotation interface to use
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": stream,  # Incoming stream of examples
        "before_db": before_db if remove_base64 else None,
        "exclude": exclude,  # List of dataset names to exclude
        "config": {  # Additional config settings, mostly for app UI
            "label": ", ".join(label) if label is not None else "all",
            "labels": label,  # Selectable label options,
            "darken_image": 0.3 if darken else 0,
        },
    }

Note how it's remove_base64 as a parameter in image_manual and how it's used in the returned dictionary.