Audio Transcription | Input Hash Error

derrickjnet · November 5, 2021, 3:18pm

python3 -m prodigy audio.transcribe speaker_transcripts ./recordings

Greetings,
I am receiving the following error.

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/derrick/.local/lib/python3.7/site-packages/prodigy/__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 339, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src/prodigy/core.pyx", line 366, in prodigy.core._components_to_ctrl
  File "cython_src/prodigy/core.pyx", line 125, in prodigy.core.Controller.__init__
  File "cython_src/prodigy/components/feeds.pyx", line 102, in prodigy.components.feeds.Feed.__init__
  File "cython_src/prodigy/components/feeds.pyx", line 148, in prodigy.components.feeds.Feed._init_stream
  File "cython_src/prodigy/components/stream.pyx", line 107, in prodigy.components.stream.Stream.__init__
  File "cython_src/prodigy/components/stream.pyx", line 58, in prodigy.components.stream.validate_stream
  File "cython_src/prodigy/components/loaders.pyx", line 29, in _add_attrs
  File "cython_src/prodigy/components/filters.pyx", line 48, in filter_duplicates
KeyError: '_input_hash'

ljvmiranda921 · November 9, 2021, 1:29am

Hi, @derrickjnet thanks for reporting. For the record, I can replicate this with an MP3 audio file. Will look into it!

derrickjnet · November 9, 2021, 2:23am

Awesome, please keep me updated. We're using prodigy solely for audio.

ljvmiranda921 · November 10, 2021, 3:17am

Hi @derrickjnet , a bugfix is already underway, but it will be part of the next release.

In the meantime, it is possible for you to patch it yourself so you can keep working (it's just a one-line change). We ship the uncompiled source of the recipes and they're accessible in your machine.

First, you need to know the location where prodigy was installed. You can do so by running:

prodigy stats

This will reveal the Location of the prodigy files. If you're on Linux, it's usually in the .local directory of your home folder (specifically under site-packages). Then head over path/to/prodigy/recipes/:

cd path/to/.local/lib/python3.8/site-packages/prodigy/recipes

and open audio.py.

There, you can edit the get_stream call inside the transcribe function. You just need to add rehash=True

- stream = get_stream(source, loader=loader, dedup=True, is_binary=False)
+ stream = get_stream(source, loader=loader, rehash=True, dedup=True, is_binary=False)

In my Prodigy version (1.11.4), this is in line 108. Remember to do it within the transcribe() function.
Lastly, save the file and try rerunning your command. Hope it helps!

Topic		Replies	Views
Error while running a variant of the old ner_manual recipe in Python3.12	2	114	May 2, 2024
Error running recipe with CSV file done , solved , streams	8	471	August 17, 2021
textcat.teach with pattern match failed with trained model usage , spacy , solved	2	477	August 3, 2020
setting unsegmented=True throws KeyError in ner.teach ner , done	3	756	June 7, 2018
Issue with Prompt Tournament in Documented Haiku Example	1	122	February 26, 2024

Audio Transcription | Input Hash Error

Related topics