setting unsegmented=True throws KeyError in ner.teach

kavan · May 31, 2018, 11:53pm

I’m trying to not allow sentences to split in the ner.teach recipe.

@prodigy.recipe(‘ner.teach.wrapper’) def ner_teach_wrapper(dataset, spacy_model, language, label=None, unsegmented=True):
and thus when i set unsegmented=True its throwing following error. Works perfectly fine when i leave the unsegmented option to its default setting.

tasks = controller.get_questions() File “cython_src/prodigy/core.pyx”, line 87, in prodigy.core.Controller.get_questions File “cython_src/prodigy/core.pyx”, line 71, in iter_tasks File “cython_src/prodigy/components/sorters.pyx”, line 136, in iter File “cython_src/prodigy/components/sorters.pyx”, line 51, in genexpr File “cython_src/prodigy/models/ner.pyx”, line 260, in call File “cython_src/prodigy/models/ner.pyx”, line 228, in get_tasks File “cytoolz/itertoolz.pyx”, line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538) File “cython_src/prodigy/models/ner.pyx”, line 206, in predict_spans KeyError: ‘_input_hash’

I cant find the cause of this. Any idea?

ines · June 1, 2018, 12:15am

Thanks for the report! This is strange… for some reason, the hashes don’t seem to get added correctly to the stream, even though the loader should take care of this The split_sentences preprocessor (which is used if you do want to segment the text) rehashes the stream again after segmenting, so I guess that’s why the problem doesn’t occur here. It’s still mysterious, though, because I don’t understand how the hashes would get lost…

I’ll investigate this – pretty sure we can still get a fix in for the upcoming release!

In the meantime, you can try loading and hashing your stream manually before you pass it into ner.teach. If you’re calling the ner.teach recipe function directly from your wrapper, you can also pass in an already loaded stream as the source argument (instead of a string). Here’s an example of the loading and hashing:

from prodigy.components.loaders import JSONL  # or however you want to load it
from prodigy.util import set_hashes

stream = JSONL(your_source)
stream = (set_hashes(eg) for eg in stream)

kavan · June 1, 2018, 12:43am

Thanks for the prompt workaround. It works now. Looking forward to the fix in next release.

ines · June 7, 2018, 5:48pm

Just released v1.5.0, which should fix this problem. All streams that pass through the built-in recipes are now hashed before they are processed by the model.

Topic		Replies	Views
ner correct with prodigy 1.11.8 ner	11	533	December 30, 2022
split_sents_threshold setting not working with custom ner.correct usage , custom	7	805	July 7, 2020
ner.teach returning argument 'beam' incorrect type ner , done	6	856	November 12, 2020
Error while running a variant of the old ner_manual recipe in Python3.12	2	114	May 2, 2024
ner.manual task with add_tokens and skip=True fails with KeyError. ner , done	5	614	December 11, 2018

setting unsegmented=True throws KeyError in ner.teach

Related topics