Annotating Multiple Audio Files in the Same Session (one after another)

alexp · November 28, 2024, 9:31am

Is it possible to annotate multiple audio files in the same session (one after another)? For example I have 10 jobs I want to annotate, each with its respective audio.mp3 and speaker_data.txt (contained in different local folders that are named as their respective job_id). My recipe.py looks like the following:

import os
import prodigy
from prodigy.components.loaders import Audio

@prodigy.recipe("speaker-audio-manual")
def speaker_audio_manual(dataset: str, jobs_folder: str = "Jobs"):
    """
    A custom Prodigy recipe for annotating speaker data across multiple job folders.

    Args:
        dataset (str): The name of the Prodigy dataset.
        jobs_folder (str): The path to the directory containing job folders.

    Returns:
        dict: A Prodigy configuration dictionary.
    """

    def get_audio_examples(jobs_folder):
        # Iterate over job folders and prepare the examples
        for job_id in os.listdir(jobs_folder):
            job_folder_path = os.path.join(jobs_folder, job_id)

            # Paths for speaker data and recording
            speaker_data_path = os.path.join(job_folder_path, "speaker_data.txt")
            recording_path = os.path.join(job_folder_path, "audio.mp3")

            # Ensure both files exist
            if os.path.exists(speaker_data_path) and os.path.exists(recording_path):
                # Create an example for Prodigy
                yield {
                    "audio": recording_path,
                    "meta": {"job_id": job_id},
                    "options": [
                        {"id": "SPEAKER_A", "text": "Speaker A"},
                        {"id": "SPEAKER_B", "text": "Speaker B"},
                        {"id": "SPEAKER_C", "text": "Speaker C"},
                        {"id": "SPEAKER_D", "text": "Speaker D"},
                        {"id": "SPEAKER_E", "text": "Speaker E"}
                    ]
                }
            else:
                print(f"Missing files in {job_folder_path}. Skipping this folder.")

    # Load the examples from the job folders
    examples = get_audio_examples(jobs_folder)

    return {
        "dataset": dataset,
        "stream": examples,
        "view_id": "audio_manual",
        "config": {
            "audio_loop": True,
            "labels": ["SPEAKER_A", "SPEAKER_B", "SPEAKER_C", "SPEAKER_D", "SPEAKER_E"],
        }
    }

However when I run the following line in terminal, it allows me to click the check button to move onto the next job, but none of them actually display the audio file (i.e. there are no audio waves that I can drag speaker labels on). Furthermore, the bottom correctly displays the job_id and filepath of speaker_data.txt, however there is no mention of the audio.mp3.

prodigy speaker-audio-manual dataset Jobs -F recipe.py

magdaaniol · November 28, 2024, 5:15pm

Welcome to the forum @alexp

What you're trying to do is definitely possible. The main issue in your recipe is that the data is not being loaded correctly. You won't be able to load media from local file paths as the browser will block it, which is why you can't see the audio rendering in Prodigy UI. You will have to convert your audio files into base-64 encoded data instead.
You can use the fetch_media helper for this. After creating your examples interator, you can add the following:

examples = fetch_media(examples, ["audio"], skip=True)

If your files are big, the encoding will result in sizeable strings so you might want to remove it before saving to the database to avoid DB bloat. You can use the before_db callback for this (the example in the docs actually shows how to remove the encoded image data - it's exactly the same for audio, you only need to change the key from image to audio).

Is your intention to provide a multiple choice question for the users? I'm asking because you're adding options in the get_audio_examples function. audio.manual UI lets you mark regions on the waveform. For that it's enough to provide the list of labels in the config, like you did.
If you need to add multiple choice questions you would have to use blocks UI instead, which combines audio_manual with choice.

Finally, if your data is already organized in the meaningful folders, you might consider using our new pages UI that leverages the folder structure to group tasks into meaningful collections. Make sure to use the audio loader with it. See here for more info on loading with pages: Loaders and Input Data · Prodigy · An annotation tool for AI, Machine Learning & NLP

Let me know if you need any further assistance!

Topic		Replies	Views
Multi-stage speaker audio classification with `pyannote.sad.manual` and `audio manual` usage , custom , audio	13	2109	September 28, 2020
✨ Audio annotation UI (beta) news , audio	21	4959	March 10, 2023
Combine audio.manual and audio.transcribe? solved	4	481	September 30, 2022
Labeling with both text and audio data	1	97	May 10, 2024
How to use different labels for individual audio files? usage	4	241	October 25, 2023

Annotating Multiple Audio Files in the Same Session (one after another)

Related topics