ASR text correct/edit recipe

workworkwork · July 28, 2022, 9:23pm

I have CSV's with columns:
mp3, start, end, text

Is there a prodigy recipe to QC/edit those text strings while listening to the corresponding mp3 segment?

ryanwesslen · July 28, 2022, 10:29pm

For ASR / transcription, the audio.transcribe recipe does just that.

You'll need to convert your data for audio and video (e.g., URL to video). Your URL column would need to be "video" and you can try loading a csv, although .jsonl is preferred.

I would suggest looking through the Audio / Video documentation as well.

Let me know if you have any further questions!

workworkwork · July 29, 2022, 6:52am

Glad to see it is straight forward.

What field names should I use for:
video, start_second, end_second, existing_asr_text

I can create a jsonl file, but I can't seem to pin down documentation on clip sub-segments in seconds etc.

ryanwesslen · July 29, 2022, 1:20pm

Are the records (e.g., URL) in video unique? If so, then can you explain what is the role of the start_second and end_second?

If they are not unique, could you use the start_second and end_second fields to make them unique by creating a custom loader (python script)?

If you are able to give me an example of sample data, I could try to create an example.

workworkwork · July 29, 2022, 4:11pm

The URL records are not unique.

Each csv is one mp3 (they can be hours)
each row is one segment. (< 20 seconds)

ryanwesslen · July 29, 2022, 7:25pm

The most efficient approach may be to create a Python script as a custom loader using pydub to slice the segments by looping through each start_second and end_second pair. Make sure you return an iterable stream for your loader.

Alternatively, you could use pydub to create a directory with separate .mp3 files for each segment. Then you could use the audio.transcribe to load the directory without needing a loader.

workworkwork · July 29, 2022, 9:14pm

So if I split them up my self what do the fields names look like in the json/csv?

video and text (for the existing asr text to show up in the editor)

Then just the basic audio.transcribe recipe with —video?

ryanwesslen · August 1, 2022, 1:43pm

Take a glance at the documentation of audio.transcribe recipe details.

Yep - your video could have the path/URL to your individual files (i.e., correspond to each row in your original .csv that is a unique audio cut based on start and end times).

Then, you can add to your command the field-id where you can put the field name of your text text. Alternatively, the default is transcript so if you change the field name to transcript, by default, it would read that field without needing to be explicit.

Hopefully now you have everything you need to get started!

jai · August 8, 2022, 8:25pm

Similar question which is related to this topic: If I have start and end times of individual words: as the audio plays would I be able to bold/italicize or mark the words in some way perhaps by writing some more code.

ryanwesslen · August 11, 2022, 3:12pm

hi @jai!

Yes, with some custom HTML, you can do add whatever is possible with HTML or CSS. What you'll use is a html template.

{"html": "<strong>bold</strong> not bold"}

Depending on your stream (data you're loading), you can call keys (e.g., title or nested keys like data.value) with double brackets {{ }} like this example (includes an image too):

<h2>{{title}}</h2>
<strong>{{data.value}}</strong>
<br />
<img src="{{image}}" />

Also, a bit more advanced, my colleague @koaning created this cool example of "bionic" reading (tweet) that highlights only initial characters of words:

Be sure to look how he created the construct_html function. You'll create your own function based on whatever specifications you want.

gist.github.com

https://gist.github.com/koaning/f73503e8cd0a7cddafa206c656068df4

bionic.py

import pyphen

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.db import connect

hyphenator = pyphen.Pyphen(lang="en_US")

def construct_html(text):
    hyphend = hyphenator.inserted(text)

This file has been truncated. show original

If you have any issues, feel free to send a code snippet and we can work through the example.

Topic		Replies	Views
video labeling with text captions usage , audio , video	1	601	June 18, 2020
Upload existing text (previously transcribed) and editing it in Prodigy Audio Transcription recipe enhancement , audio	3	301	November 9, 2023
Correct Audio Transcription usage , done , solved , streams , audio	24	1883	April 21, 2021
Combine audio.manual and audio.transcribe? solved	4	478	September 30, 2022
Multi-stage speaker audio classification with `pyannote.sad.manual` and `audio manual` usage , custom , audio	13	2106	September 28, 2020

ASR text correct/edit recipe

Related topics