Error with annotation for speaker diarization


I'm using Prodigy to annotate some audio files for speaker diarization. I'm also using the pyannote open-source model which provides some Prodigy recipes as shown here. I'm getting a weird issue where the audio file won't load and I can't annotate it. I'm getting this error with both the pyannote.dia.manual recipe and the standard audio.manual recipe. It's worth noting that I thought the error was because there were spaces in the file names, but that turned out not to be the issue.

These are the commands I've tried running:
prodigy pyannote.dia.manual test_dataset test/
prodigy audio.manual test_dataset test/ --label SPEAKER1,SPEAKER2

How should I go about trouble-shooting this?

As a follow up, it seems like a sample wav file that I downloaded on the internet works, but the wav files I currently have don't. Are there any requirements/restrictions on types of wav files I can use, like sample rate or something else?

Apparently the audio annotation supports stereo audio but not mono. I could convert all my files to stereo, but is there any better/easier way of doing it / am I missing something?

Hi! Did you test it with the same files converted to stereo and did that solve the issue you were having? If so, this might be related, although I'm confused that it'd fail like this and give you a blank UI :thinking:That's definitely strange and unideal.

If it's not related to stereo vs. mono, how large are your audio files and how are you loading them in? By default, Prodigy will encode the file as a base64 string, which is a fine solution for small snippets and means that the original data will be stored with the example (and it makes it easy to create short snippets and stream them in programmatically without having to store them on disk). However, if the file is very large, this can potentially lead to loading issues if it's all sent over REST as a string. In that case, you could try the audio server loader via --loader audio-server, which will serve the files via a local web server. Alternatively, you can also provide them as URLs (e.g. via an S3 bucket) with a JSONL file and --loader jsonl.

Yes, after I converted my files to stereo, instead of mono, it worked fine.

Thanks for checking, that's helpful! Glad to hear that there's at least a temporary workaround then.

I'll try and reproduce this to figure out what the underlying problem could be. Also, if you have a short sample of a working stereo and non-working mono snippet, that'd be helpful as well (also to double-check that there's nothing else that could be relevant here).