suggest update to audio docs

The documentation / recipe listed here in the official docs appears to have a syntax error.

prodigy pyannote.sad.manual speech_activity ./data/wav --chunk 5

Would it be possible to correct it?

There appear to be at least two issues:

  1. The -- prefix to the chunk argument should be -
  2. The -chunk argument needs to be inserted prior to the dataset and source, rather than at the end

The pyannote source for the Prodigy recipe appears to indicate that the -chunk argument requires a float, not an int, but this example uses an int. Casting appears to happen automatically so it's not the end of the world, but updating the recipe example to reflect a float being passed would reduce one more venue of potential confusion.

(prodigy) → prodigy pyannote.sad.manual speech_activity ./wav --chunk 5.0
# chokes as follows:
usage: prodigy pyannote.sad.manual [-h] [-chunk 10.0] [-speed 1.0] dataset source
prodigy pyannote.sad.manual: error: unrecognized arguments: --chunk 5.0

(prodigy) → prodigy pyannote.sad.manual --chunk 5.0 speech_activity ./wav
# chokes as follows:
usage: prodigy pyannote.sad.manual [-h] [-chunk 10.0] [-speed 1.0] dataset source
prodigy pyannote.sad.manual: error: unrecognized arguments: --chunk ./wav

(prodigy) → prodigy pyannote.sad.manual -chunk 5.0 speech_activity ./wav
# works as expected

With a freshly-created prodigy environment supplemented by the pyannote github repo and the associated develop branch, the error trace below is the consistent result across multiple .wav files created from other source files via ffmpeg -i source-file.mp3 -f s16le -ar 16k -ac 1 destination-file.wav, all of which we were able to pass through a pyannote-driven SAD inference process when working outside of Prodigy. Any guidance, please?

(prodigy) → ls -lah
total 110288
drwxr-xr-x@  5 tsslade  staff   160B Aug 25 21:00 .
drwxr-xr-x@ 26 tsslade  staff   832B Aug 25 20:59 ..
-rw-r--r--@  1 tsslade  staff    47M Jun 25 19:14 1593136205820.wav

(prodigy) → prodigy pyannote.sad.manual speech_activity .
Using cache found in /Users/tsslade/.cache/torch/hub/pyannote_pyannote-audio_master
Using cache found in /Users/tsslade/.cache/torch/hub/pyannote_pyannote-audio_master
/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/pyannote/audio/embedding/approaches/arcface_loss.py:170: FutureWarning: The 's' parameter is deprecated in favor of 'scale', and will be removed in a future release
  warnings.warn(msg, FutureWarning)
Traceback (most recent call last):
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/prodigy/__main__.py", line 60, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 318, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src/prodigy/core.pyx", line 138, in prodigy.core.Controller.__init__
  File "cython_src/prodigy/components/feeds.pyx", line 56, in prodigy.components.feeds.SharedFeed.__init__
  File "cython_src/prodigy/components/feeds.pyx", line 155, in prodigy.components.feeds.SharedFeed.validate_stream
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/toolz/itertoolz.py", line 376, in first
    return next(iter(seq))
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/pyannote/audio/interactive/recipes/sad.py", line 99, in sad_manual_stream
    speech: Annotation = pipeline.compute_speech(file).to_annotation(
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/pyannote/audio/interactive/pipeline.py", line 170, in compute_speech
    sad_scores = self.sad(current_file)
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/pyannote/audio/features/wrapper.py", line 280, in __call__
    return self.scorer_(current_file)
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/pyannote/audio/features/base.py", line 149, in __call__
    y, sample_rate = self.raw_audio_(current_file, return_sr=True)
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/pyannote/audio/features/utils.py", line 237, in __call__
    y = self.get_features(y, sample_rate)
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/pyannote/audio/features/utils.py", line 173, in get_features
    y = librosa.core.resample(y.T, sample_rate, self.sample_rate).T
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/librosa/core/audio.py", line 548, in resample
    util.valid_audio(y, mono=False)
  File "/Users/tsslade/miniconda3/envs/prodigy/lib/python3.8/site-packages/librosa/util/utils.py", line 305, in valid_audio
    raise ParameterError(
librosa.util.exceptions.ParameterError: Mono data must have shape (samples,). Received shape=(1, 24682736)

Thanks for the heads-up, I'll fix this for now! I should probably submit a small PR to pyannote.audio that adjusts the argument for consistency (I think if it specifies a separate abbreviation the usage would be --chunk vs. -c etc.). With -chunk, it does work for me even at the end of the command, though?

About the error you shared: it looks like this occurs when scoring your input data within pyannote.audio. I don't exactly know how the setup here differs from what you've been running outside of Prodigy, but there's something it doesn't like about your data. It might be better to open an issue on the pyannote.audio tracker, since this seems more specific to the integration and model set up in the recipe, rather than Prodigy itself :slightly_smiling_face:

Thank you for redirecting me to the pyannote maintainers, @ines! Just to close the circle, I filed this issue and we were able to resolve it very quickly.
The underlying issue was a breaking change introduced by one of pyannote's dependencies (librosa) when it moved from v0.7.2 --> v0.8. pyannote-audio has now been pinned to the earlier version of the librosa library, and the recipe works when run within an updated env.

1 Like