I have CSV's with columns:
mp3, start, end, text
Is there a prodigy recipe to QC/edit those text strings while listening to the corresponding mp3 segment?
I have CSV's with columns:
mp3, start, end, text
Is there a prodigy recipe to QC/edit those text strings while listening to the corresponding mp3 segment?
Hi @workworkwork!
For ASR / transcription, the audio.transcribe
recipe does just that.
You'll need to convert your data for audio and video (e.g., URL to video). Your URL column would need to be "video" and you can try loading a csv, although .jsonl
is preferred.
I would suggest looking through the Audio / Video documentation as well.
Let me know if you have any further questions!
Glad to see it is straight forward.
What field names should I use for:
video, start_second, end_second, existing_asr_text
I can create a jsonl file, but I can't seem to pin down documentation on clip sub-segments in seconds etc.
Are the records (e.g., URL) in video
unique? If so, then can you explain what is the role of the start_second
and end_second
?
If they are not unique, could you use the start_second
and end_second
fields to make them unique by creating a custom loader (python script)?
If you are able to give me an example of sample data, I could try to create an example.
The URL records are not unique.
Each csv is one mp3 (they can be hours)
each row is one segment. (< 20 seconds)
The most efficient approach may be to create a Python script as a custom loader using pydub
to slice the segments by looping through each start_second
and end_second
pair. Make sure you return an iterable stream for your loader.
Alternatively, you could use pydub
to create a directory with separate .mp3
files for each segment. Then you could use the audio.transcribe
to load the directory without needing a loader.
So if I split them up my self what do the fields names look like in the json/csv?
video and text (for the existing asr text to show up in the editor)
Then just the basic audio.transcribe recipe with —video?
Take a glance at the documentation of audio.transcribe
recipe details.
Yep - your video
could have the path/URL to your individual files (i.e., correspond to each row in your original .csv
that is a unique audio cut based on start
and end
times).
Then, you can add to your command the field-id
where you can put the field name of your text text
. Alternatively, the default is transcript
so if you change the field name to transcript
, by default, it would read that field without needing to be explicit.
Hopefully now you have everything you need to get started!
Similar question which is related to this topic: If I have start and end times of individual words: as the audio plays would I be able to bold/italicize or mark the words in some way perhaps by writing some more code.
hi @jai!
Yes, with some custom HTML, you can do add whatever is possible with HTML or CSS. What you'll use is a html template.
{"html": "<strong>bold</strong> not bold"}
Depending on your stream (data you're loading), you can call keys (e.g., title
or nested keys like data.value
) with double brackets {{ }}
like this example (includes an image too):
<h2>{{title}}</h2>
<strong>{{data.value}}</strong>
<br />
<img src="{{image}}" />
Also, a bit more advanced, my colleague @koaning created this cool example of "bionic" reading (tweet) that highlights only initial characters of words:
Be sure to look how he created the construct_html
function. You'll create your own function based on whatever specifications you want.
If you have any issues, feel free to send a code snippet and we can work through the example.