Editing Text and Linking Audio via Annotation Instructions

Hi we are looking into different platforms for annotation and really like Prodigy from what we have seen so far. There are two features in which I was unclear if Prodigy could support:

  1. Linking some specific audio per each document
  2. Editing raw text in each document along w/ the annotation task (span categorization and relation labeling)

Our use case is that we have some audio files and their transcriptions that we would like to do some span categorization and relation labeling on. However, sometimes the audio quality is poor so we would like our annotators to fix mistranscribed words hence the need to edit the raw text in each document and the ability to display the specific audio for each doc.


hi @jai!

Thanks for your question and welcome to the Prodigy community :wave:

For audio transcription, you could use the audio.transcribe recipe. If all of your audio files are unique, you could load them along with the transcription as files like .jsonl. You may need to do a small amount of python pre-processing but check out the file loader docs for audio. Just had a similar request earlier this week on how to handle a raw .csv file:

You can combine different recipes to create custom recipe/interfaces using blocks. So if you wanted, you could combine different interfaces like the audio.transcribe with the rel.manual, which would enable labeling spans/relations.

The one tricky part is that if the user had to correct to a transcription, you'd need to update (refresh) the text passed to the rel.manual after the user has used the textbox to correct the transcription. Does this sound right?

For this, you'd likely need to use an update / callback using some JavaScript. There is an example of something similar where we show how you can use a button to change existing text to a different case. In theory, I suspect you could try to do the same with a text box that first provides the original transcription, then a user can edit/correct it. Then they could click the button to activate the call back which then updates the corrected transcript and resends to the rel.manual. I haven't tried this but would be interested to see if it's possible.

Alternatively, perhaps the simplest solution would be to run this in two rounds. Round 1, you simply fix/correct transcriptions with audio.transcribe. Round 2, you use corrected transcriptions only in rel.manual and treat it like a typical span/relations annotation. I tend to prefer simpler tasks then trying to do everything at the same time, so I would likely choose this route.

Thanks again for your question and let us know if you have further questions!

Yes thats correct. We can't go with the route of running multiple rounds because we don't want our annotators to correct everything in the transcription (there can be a lot); we only want them to correct them if they are relevant for span and relation labeling. I think the button to update text maybe the way to go then.