Audio Transcription | Input Hash Error

python3 -m prodigy audio.transcribe speaker_transcripts ./recordings

Greetings,
I am receiving the following error.

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/derrick/.local/lib/python3.7/site-packages/prodigy/__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 339, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src/prodigy/core.pyx", line 366, in prodigy.core._components_to_ctrl
  File "cython_src/prodigy/core.pyx", line 125, in prodigy.core.Controller.__init__
  File "cython_src/prodigy/components/feeds.pyx", line 102, in prodigy.components.feeds.Feed.__init__
  File "cython_src/prodigy/components/feeds.pyx", line 148, in prodigy.components.feeds.Feed._init_stream
  File "cython_src/prodigy/components/stream.pyx", line 107, in prodigy.components.stream.Stream.__init__
  File "cython_src/prodigy/components/stream.pyx", line 58, in prodigy.components.stream.validate_stream
  File "cython_src/prodigy/components/loaders.pyx", line 29, in _add_attrs
  File "cython_src/prodigy/components/filters.pyx", line 48, in filter_duplicates
KeyError: '_input_hash'

Hi, @derrickjnet thanks for reporting. For the record, I can replicate this with an MP3 audio file. Will look into it!

Awesome, please keep me updated. We're using prodigy solely for audio.

Hi @derrickjnet , a bugfix is already underway, but it will be part of the next release.

In the meantime, it is possible for you to patch it yourself so you can keep working (it's just a one-line change). We ship the uncompiled source of the recipes and they're accessible in your machine.

First, you need to know the location where prodigy was installed. You can do so by running:

prodigy stats

This will reveal the Location of the prodigy files. If you're on Linux, it's usually in the .local directory of your home folder (specifically under site-packages). Then head over path/to/prodigy/recipes/:

cd path/to/.local/lib/python3.8/site-packages/prodigy/recipes

and open audio.py.

There, you can edit the get_stream call inside the transcribe function. You just need to add rehash=True

- stream = get_stream(source, loader=loader, dedup=True, is_binary=False)
+ stream = get_stream(source, loader=loader, rehash=True, dedup=True, is_binary=False)

In my Prodigy version (1.11.4), this is in line 108. Remember to do it within the transcribe() function.
Lastly, save the file and try rerunning your command. Hope it helps!