Audio UI enhancement: keyboard shortcuts and clickthrough

Is there a mechanism to initiate (and then conclude) an audio_span via keyboard shortcut?

In an ideal world

  1. I could map e.g. spacebar to an action like toggle_span so that I could do a rough cut of audio annotation without needing to remove my hands from the keyboard
  2. it would apply whatever option is currently selected
  3. the method for selecting an existing annotation span and thus shifting it/removing it would be distinct
  4. the second press of the spacebar would close the span that had been opened by the initial press.

...point 3. being very important because my use case is that of overlapping voices. So I'd like to be able to do a pass through my looping audio while applying label-1, then toggle over to label-2 by pressing '2' on my keyboard, and be able to mark the onset of a label-2 span even in the middle of an existing label-1 span by hitting spacebar (or whatever).

If the 'close active open span' action had to be mapped to a different keystroke, that'd be fine too (although marginally less smooth).

If there were a distinct action for toggling an audio_span that was basically "remove if span exists" or "truncate span to this cursor position" or something, I'd want to be able to assign that to a different keyboard shortcut.

When annotating audio via audio.manual or a similar custom recipe, it is common for audio_spans requiring different labels to be partially or fully overlapping.

In those instances, it appears that an out-of-the-box audio.manual approach does not support initiating a new audio_span at a point that is already encompassed by an existing audio_span; you have to either

  1. start at the end (assuming it ends after the existing span ends) and trace it backward to its origin, or
  2. temporarily displace the existing span out of the way and replace it once the current span's boundaries have been defined.

For spans which are fully encapsulated by other spans, the approach (2.) is the only option.

Is there a mechanism to allow, for instance, an initial click to select the top-level span, and a second click to engage with the underlying audio waveform? Assuming there is not, would it be possible to enable such an interaction sequence via custom JavaScript or come up with an alternative that would accomplish the same goal of being able to initiate an overlapping audio_span without needing to displace the original?

Thanks for the detailed enhancement suggestions! I merged both topics into one thread because they're both related to the same interface and your specific task.

For the clickthrough mechanism, we could consider something similar to what the image_manual UI has: a keyboard shortcut that lets you toggle clickthrough/not clickthrough. The main challenge for the audio UI is mostly the integration with WaveSurfer, so I'd have to see what's possible.

1 Like

Thanks for considering them! In the meantime one of the workarounds we're considering is the business of having multiple rounds of annotation on the same input stream (à la Multi-stage speaker audio classification with pyannote.sad.manual and audio manual). It's not as ideal, as much because of the context-switching as because of the additional overhead of getting the audio input chunking to work...but if others stumble across this thread before that enhancement makes it onto the roadmap and into reality, it's one possible resolution.