ASR text correct/edit recipe

I have CSV's with columns:
mp3, start, end, text

Is there a prodigy recipe to QC/edit those text strings while listening to the corresponding mp3 segment?

Hi @workworkwork!

For ASR / transcription, the audio.transcribe recipe does just that.

You'll need to convert your data for audio and video (e.g., URL to video). Your URL column would need to be "video" and you can try loading a csv, although .jsonl is preferred.

I would suggest looking through the Audio / Video documentation as well.

Let me know if you have any further questions!

Glad to see it is straight forward.

What field names should I use for:
video, start_second, end_second, existing_asr_text

I can create a jsonl file, but I can't seem to pin down documentation on clip sub-segments in seconds etc.

Are the records (e.g., URL) in video unique? If so, then can you explain what is the role of the start_second and end_second?

If they are not unique, could you use the start_second and end_second fields to make them unique by creating a custom loader (python script)?

If you are able to give me an example of sample data, I could try to create an example.

The URL records are not unique.

Each csv is one mp3 (they can be hours)
each row is one segment. (< 20 seconds)

The most efficient approach may be to create a Python script as a custom loader using pydub to slice the segments by looping through each start_second and end_second pair. Make sure you return an iterable stream for your loader.

Alternatively, you could use pydub to create a directory with separate .mp3 files for each segment. Then you could use the audio.transcribe to load the directory without needing a loader.

So if I split them up my self what do the fields names look like in the json/csv?

video and text (for the existing asr text to show up in the editor)

Then just the basic audio.transcribe recipe with —video?

Take a glance at the documentation of audio.transcribe recipe details.

Yep - your video could have the path/URL to your individual files (i.e., correspond to each row in your original .csv that is a unique audio cut based on start and end times).

Then, you can add to your command the field-id where you can put the field name of your text text. Alternatively, the default is transcript so if you change the field name to transcript, by default, it would read that field without needing to be explicit.

Hopefully now you have everything you need to get started!

Similar question which is related to this topic: If I have start and end times of individual words: as the audio plays would I be able to bold/italicize or mark the words in some way perhaps by writing some more code.

hi @jai!

Yes, with some custom HTML, you can do add whatever is possible with HTML or CSS. What you'll use is a html template.

{"html": "<strong>bold</strong> not bold"}

Depending on your stream (data you're loading), you can call keys (e.g., title or nested keys like data.value) with double brackets {{ }} like this example (includes an image too):

<br />
<img src="{{image}}" />

Also, a bit more advanced, my colleague @koaning created this cool example of "bionic" reading (tweet) that highlights only initial characters of words:

Be sure to look how he created the construct_html function. You'll create your own function based on whatever specifications you want.

If you have any issues, feel free to send a code snippet and we can work through the example.