Video and audio annotation simultaneously

Hi! :slightly_smiling_face: The audio and video UI support annotating segments in the audio track – there's currently no interface for object tracking annotation in video files.

However, if your goal is to just mark the speaker (and not track the speaker's movement across frames etc.), you could probably use a combination of the audio or video interface, and the image_manual interface with a still image from the video. See here for the docs on custom interfaces with blocks. The solution here really depends on what your end goal is, what type of structured information you want to extract and what you're planning on using that structured data for later on.