Video and audio annotation simultaneously

Hello All,

Brand new to Prodigy (got my keys today!)

We are exploring an audio video annotation task (on a video file) - in audio we annotate the the speaker and in video part we want to draw bounding boxes around the speaker as well. So far it seems this is not directly supported by the existing inbuilt recipes. Please correct me if I'm wrong.

Is there a way to do this via a custom recipe? Appreciate any guidance, thanks!

Hi! :slightly_smiling_face: The audio and video UI support annotating segments in the audio track – there's currently no interface for object tracking annotation in video files.

However, if your goal is to just mark the speaker (and not track the speaker's movement across frames etc.), you could probably use a combination of the audio or video interface, and the image_manual interface with a still image from the video. See here for the docs on custom interfaces with blocks. The solution here really depends on what your end goal is, what type of structured information you want to extract and what you're planning on using that structured data for later on.

Hello Ines,

Thanks for your answer. We don't need to track the person across frames.

You suggestion could work for us (an audio + image.manual) recipe combined using blocks. But, I'm unclear how to extract frames from the video inside the recipe. We want to make sure the timelines of the audio and frame are synced - i.e when frame no 10 is being displayed for annotation, we would like to play the audio at (or around that frame ) to the annotator. Do you think such a setup can be possible? Would be great if you have any sample code in this direction. Thanks!