`ValueError` when initiating `ner.teach`.

I run the following

prodigy dataset ner_money "Improve MONEY on Earnings data"
prodigy ner.teach ner_money en_core_web_sm earnings_lines.jsonl --label MONEY

but then I receive the following traceback

Traceback (most recent call last):
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/waitress/channel.py", line 338, in service
    task.service()
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/waitress/task.py", line 169, in service
    self.execute()
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/waitress/task.py", line 399, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/hug/api.py", line 423, in api_auto_instantiate
    return module.__hug_wsgi__(*args, **kwargs)
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/falcon/api.py", line 244, in __call__
    responder(req, resp, **params)
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/hug/interface.py", line 793, in __call__
    raise exception
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/hug/interface.py", line 766, in __call__
    self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/hug/interface.py", line 703, in call_function
    return self.interface(**parameters)
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/hug/interface.py", line 100, in __call__
    return __hug_internal_self._function(*args, **kwargs)
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/prodigy/app.py", line 105, in get_questions
    tasks = controller.get_questions()
  File "cython_src/prodigy/core.pyx", line 109, in prodigy.core.Controller.get_questions
  File "cython_src/prodigy/components/feeds.pyx", line 56, in prodigy.components.feeds.SharedFeed.get_questions
  File "cython_src/prodigy/components/feeds.pyx", line 61, in prodigy.components.feeds.SharedFeed.get_next_batch
  File "cython_src/prodigy/components/feeds.pyx", line 130, in prodigy.components.feeds.SessionFeed.get_session_stream
  File "/home/bjerre/Projects/plx/fact-extractor/venv/lib/python3.6/site-packages/toolz/itertoolz.py", line 368, in first
    return next(iter(seq))
  File "cython_src/prodigy/components/sorters.pyx", line 136, in __iter__
  File "cython_src/prodigy/components/sorters.pyx", line 51, in genexpr
  File "cython_src/prodigy/models/ner.pyx", line 273, in __call__
  File "cython_src/prodigy/models/ner.pyx", line 256, in get_tasks
ValueError: dictionary update sequence element #0 has length 1; 2 is required

Attached: earnings_lines.jsonl (1.3 KB)

Thanks for sharing the report and your data – Prodigy should have definitely handled this better.

The problem here is that when the NER model updates the "meta" with the score, it expects it to be a dictionary. However, in your data, the "meta" is a string. So changing your records to something like this should work:

{"text": "...", "meta": {"something": "cision:20150716:BIT:4665:0"}}

Ah yes of course.

It works now. I am a bit puzzled that it seems it presents the examples ordered by the lines in earnings_lines.jsonl instead of giving me the 50/50 examples. Can that be right?

KUDOS to explosion.ai ! I am truly amazed by the results for minimal effort

Yay, glad it worked!

What does the “score” (displayed in the meta in the corner) look like? Depending on what’s in the data, it might take a batch or two for the exponential moving average to calibrate.

Btw, looking at your data, you might want to try stripping out the " * " in your preprocessing (since you’re preprocessing anyways). Those could potentially throw off the model, since it’s likely never seen texts like that before. But without the bullet points, you have pretty normal sentences there that you could probably also get very decent dependency parsing and POS tagging accuracy on.

The score were pretty uniformly distributed and not centered around 0.5. Nevertheless I still got good results but I will keep an eye out for this. This was just meant as an initial test.

I actually intended to include the markdown tags. My theory being that the model might be able to pick up on those. The facts I want to extract are often found in lists like that (but not always).

Ah okay, fair enough! I’d definitely be interested in the results if you end up running experiments with and without the Markdown. This is also a nice example of the type of stuff we’re hoping that Prodigy can help with – do a few hundred examples with Markdown, a few hundred without and compare the results :slightly_smiling_face:

I will let you know. I have a 3 weeks vacation coming up so it will be on the other side though.

1 Like