Audio classification dataset

nhenrys · August 10, 2021, 1:12pm

Hello, new user here. I am wanting to use prodigy for audio file classification..

Am following the custom recipe example provided:

import prodigy
from prodigy.components.loaders import Audio

@prodigy.recipe("classify-audio")
def classify_audio(dataset, source):
    def get_stream():
        # Load the directory of audio files and add options to each task
        stream = Audio(source)
        for eg in stream:
            eg["options"] = [
                {"id": "CAR", "text": "🚗 Car"},
                {"id": "PLANE", "text": "✈️ Plane"},
                {"id": "OTHER", "text": "Other / Unclear"}
            ]
            yield eg

    return {
        "dataset": dataset,
        "stream": get_stream(),
        "view_id": "choice",
        "config": {
            "choice_style": "single",  # or "multiple"
            "choice_auto_accept": True,
            "audio_loop": True,
            "show_audio_minimap": False
        }
    }

When exporting the database using db-out I get 1000s of random characters in each row of data like so:
('......' here represents an unfathomable amount of characters)

{"audio":"data:audio/x-wav;base64,UklGRiQ6IAB.........../r/+f/4//j/9//2//b/9//3//j/+f8=","text":"EM2010-00504-2021-08-10T07-46-23-058dB","meta":{"file":"EM2010-00504-2021-08-10T07-46-23-058dB.wav"},"path":"recordings/EM2010-00504-2021-08-10T07-46-23-058dB.wav","options":[{"id":"CAR","text":"\ud83d\ude97 Car"},{"id":"PLANE","text":"\u2708\ufe0f Plane"},{"id":"OTHER","text":"Other / Unclear"}],"_input_hash":928286171,"_task_hash":-1137344558,"_session_id":null,"_view_id":"choice","config":{"choice_style":"single"},"accept":["OTHER"],"audio_spans":[],"answer":"accept"}

Is there a way to avoid this??

Thanks in advance for your help

ines · August 11, 2021, 12:11am

Hi! The string here is the base64-encoded data of the audio file (so basically, the string version of the data), which is what the Audio loader does by default. This lets you send local files via the REST API without having to host them somewhere (since modern browsers don't let you load from local paths) and stores the data with the annotations so you don't lose the reference to the original file.

However, for large files, this can get inconvenient and lead to very large files. One solution is to replace the Audio loader with the AudioServer loader, which will serve them via the web server. An alternative is to host the files in an S3 bucket or similar, and load in the URLs. Finally, you could also add a before_db callback that removes the base64 string before the examples are added to the database – see here for an example: Custom Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

If you're using a served approach or remove the base64, just make sure that you keep the original files and file names. Your annotations will include the file paths, but if the files ever change, you may lose the references and won't be able to reconstruct the data.

Topic		Replies	Views
Another issue with web interface. usage , solved , audio	4	527	October 28, 2021
choice of audios enhancement , usage , custom , solved , audio	2	660	July 5, 2021
Multi-stage speaker audio classification with `pyannote.sad.manual` and `audio manual` usage , custom , audio	13	2100	September 28, 2020
Multi class mp4 annotation. usage , audio	1	425	March 25, 2022
Audio loading error audio	2	13	February 20, 2025

Audio classification dataset

Related topics