Error after a while of using ner.teach

Hey,

Thanks for such an awesome tool and a chance to check out the BETA version.

I've got this error a couple of times already (I'm starting to wonder if there is something with my corpus, because it's a mess). Anyway, here is the stacktrace:

→ prodigy ner.teach test_set en_core_web_sm text_data.jsonl --label DATE
Added dataset test_set to database SQLite.

:sparkles: Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

Exception when serving /get_questions
Traceback (most recent call last):
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/waitress/channel.py", line 338, in service
task.service()
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/waitress/task.py", line 169, in service
self.execute()
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/waitress/task.py", line 399, in execute
app_iter = self.channel.server.application(env, start_response)
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/hug/api.py", line 421, in api_auto_instantiate
return module.hug_wsgi(*args, **kwargs)
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/falcon/api.py", line 242, in call
responder(req, resp, **params)
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/hug/interface.py", line 692, in call
self.render_content(self.call_function(input_parameters), request, response, **kwargs)
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/hug/interface.py", line 633, in call_function
return self.interface(**parameters)
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/hug/interface.py", line 99, in call
return __hug_internal_self._function(*args, **kwargs)
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/prodigy/app.py", line 58, in get_questions
tasks = controller.get_questions()
File "cython_src/prodigy/core.pyx", line 63, in prodigy.core.Controller.get_questions
File "cython_src/prodigy/core.pyx", line 58, in iter_tasks
File "cython_src/prodigy/components/sorters.pyx", line 137, in iter
File "cython_src/prodigy/components/sorters.pyx", line 53, in genexpr
File "cython_src/prodigy/models/ner.pyx", line 215, in call
File "cython_src/prodigy/models/ner.pyx", line 185, in get_tasks
File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538)
File "cython_src/prodigy/models/ner.pyx", line 151, in predict_spans
File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538)
File "cython_src/prodigy/components/preprocess.pyx", line 7, in split_sentences
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/spacy/language.py", line 501, in pipe
for doc, context in izip(docs, contexts):
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/spacy/language.py", line 513, in pipe
for doc in docs:
File "nn_parser.pyx", line 394, in pipe
File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538)
File "nn_parser.pyx", line 394, in pipe
File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538)
File "pipeline.pyx", line 346, in pipe
File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538)
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/spacy/language.py", line 504, in
docs = (self.make_doc(text) for text in texts)
File "/Users/ocselvig/Code/master_thesis/env_py3/lib/python3.6/site-packages/spacy/language.py", line 497, in
texts = (tc[0] for tc in text_context1)
File "cython_src/prodigy/components/preprocess.pyx", line 6, in genexpr
File "cython_src/prodigy/util.pyx", line 203, in filter_duplicates
File "cython_src/prodigy/components/loaders.pyx", line 64, in prodigy.components.loaders.JSONL
File "cython_src/prodigy/util.pyx", line 55, in prodigy.util.set_hashes
TypeError: 'int' object is not iterable

Any idea what it might be? Thank you!

Thanks for the report and for trying out Prodigy!

It looks like what’s going on here is that somewhere down the line, an example from text_data.jsonl is not a valid example dictionary, but an integer. So the stream of tasks ends up looking like [{}, {}, 25, {}, ...] or something like that (Prodigy’s JSON loader simply calls ujson.loads() on each line of the file). If that’s the case, the solution should be as simple as checking your input file, making sure each line contains a dictionary, and removing the broken lines.

We should probably add better error handling for cases like this. When retrieving the next batch of examples from the stream, Prodigy should raise its own exception if a task is not valid and show more information – for example: Encountered invalid task: 25 (<class 'int'>), expected: dict. Maybe this should even be handled directly in the loaders, since they always need to yield dictionaries, no matter what. Broken JSONL should probably also raise an error instead of just skipping the example. Edit: Already implemented and will be included in the next release.

1 Like