Audio UI enhancement: keyboard shortcuts and clickthrough

Is there a mechanism to initiate (and then conclude) an audio_span via keyboard shortcut?

In an ideal world

  1. I could map e.g. spacebar to an action like toggle_span so that I could do a rough cut of audio annotation without needing to remove my hands from the keyboard
  2. it would apply whatever option is currently selected
  3. the method for selecting an existing annotation span and thus shifting it/removing it would be distinct
  4. the second press of the spacebar would close the span that had been opened by the initial press.

...point 3. being very important because my use case is that of overlapping voices. So I'd like to be able to do a pass through my looping audio while applying label-1, then toggle over to label-2 by pressing '2' on my keyboard, and be able to mark the onset of a label-2 span even in the middle of an existing label-1 span by hitting spacebar (or whatever).

If the 'close active open span' action had to be mapped to a different keystroke, that'd be fine too (although marginally less smooth).

If there were a distinct action for toggling an audio_span that was basically "remove if span exists" or "truncate span to this cursor position" or something, I'd want to be able to assign that to a different keyboard shortcut.

When annotating audio via audio.manual or a similar custom recipe, it is common for audio_spans requiring different labels to be partially or fully overlapping.

In those instances, it appears that an out-of-the-box audio.manual approach does not support initiating a new audio_span at a point that is already encompassed by an existing audio_span; you have to either

  1. start at the end (assuming it ends after the existing span ends) and trace it backward to its origin, or
  2. temporarily displace the existing span out of the way and replace it once the current span's boundaries have been defined.

For spans which are fully encapsulated by other spans, the approach (2.) is the only option.

Is there a mechanism to allow, for instance, an initial click to select the top-level span, and a second click to engage with the underlying audio waveform? Assuming there is not, would it be possible to enable such an interaction sequence via custom JavaScript or come up with an alternative that would accomplish the same goal of being able to initiate an overlapping audio_span without needing to displace the original?

Thanks for the detailed enhancement suggestions! I merged both topics into one thread because they're both related to the same interface and your specific task.

For the clickthrough mechanism, we could consider something similar to what the image_manual UI has: a keyboard shortcut that lets you toggle clickthrough/not clickthrough. The main challenge for the audio UI is mostly the integration with WaveSurfer, so I'd have to see what's possible.

1 Like

Thanks for considering them! In the meantime one of the workarounds we're considering is the business of having multiple rounds of annotation on the same input stream (à la Multi-stage speaker audio classification with pyannote.sad.manual and audio manual). It's not as ideal, as much because of the context-switching as because of the additional overhead of getting the audio input chunking to work...but if others stumble across this thread before that enhancement makes it onto the roadmap and into reality, it's one possible resolution.

The clickthrough enhancement for fully overlapping labels is something I'd also be very interested in. I think I'd prefer a keyboard modifier for dragging a region instead for having the click-through; making adding a region the default behaviour. Or, if that's not possible, simply disable dragging of regions (resizing at the edges should still be allowed). If I understand this PR correctly, that should fix it:

1 Like

Ah cool, thanks for the pointer! I will try this out :+1: And now that I think about it, a shortcut for resizing/selecting would probably also make it consistent with the image annotation UI that defaults to clickthrough and allows clicking on the whole shape by pressing shift.


Hey Ines,

Do you have an update or ETA on when this would land in the nightly? It currently prevents me from using Prodigy to label our multi-label dataset.


We don't have an ETA for this currently, sorry! I do think it's a good feature request but I don't want to disrupt your project plans by asking you to wait for it You can keep an eye on the changelog here to see when it lands and I'll also be updating this thread :slightly_smiling_face:

Hey, Ines! Thanks for all the work you and the team have done on Prodigy. Any update on whether this feature is on a near- or medium-term roadmap?