KeyError: 'start' in ner.teach

Hi, I'm getting an error message when trying to use the ner.teach recipe. I'm starting the server with:

prodigy ner.teach my_dataset_name blank_spacy_model_ner_en datasetfile.jsonl --label EVENTDATE,LOCATION --patterns patterns.jsonl

Output:

Using 2 labels: EVENTDATE, LOCATION

  ✨  Starting the web server at http://........:8080 ...
  Open the app in your browser and start annotating!

When I open the app in my browser it shows "Oops, something went wrong :(". The terminal output is:

Exception when serving /get_session_questions
Traceback (most recent call last):
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/waitress/channel.py", line 336, in service
    task.service()
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/waitress/task.py", line 175, in service
    self.execute()
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/waitress/task.py", line 452, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/hug/api.py", line 451, in api_auto_instantiate
    return module.__hug_wsgi__(*args, **kwargs)
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/falcon/api.py", line 244, in __call__
    responder(req, resp, **params)
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/hug/interface.py", line 789, in __call__
    raise exception
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/hug/interface.py", line 762, in __call__
    self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/hug/interface.py", line 698, in call_function
    return self.interface(**parameters)
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/hug/interface.py", line 100, in __call__
    return __hug_internal_self._function(*args, **kwargs)
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/prodigy/_api/hug_app.py", line 228, in get_session_questions
    tasks = controller.get_questions(session_id=session_id)
  File "cython_src/prodigy/core.pyx", line 130, in prodigy.core.Controller.get_questions
  File "cython_src/prodigy/components/feeds.pyx", line 58, in prodigy.components.feeds.SharedFeed.get_questions
  File "cython_src/prodigy/components/feeds.pyx", line 63, in prodigy.components.feeds.SharedFeed.get_next_batch
  File "cython_src/prodigy/components/feeds.pyx", line 140, in prodigy.components.feeds.SessionFeed.get_session_stream
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/toolz/itertoolz.py", line 368, in first
    return next(iter(seq))
  File "cython_src/prodigy/components/sorters.pyx", line 151, in __iter__
  File "cython_src/prodigy/components/sorters.pyx", line 61, in genexpr
  File "cython_src/prodigy/util.pyx", line 380, in predict
  File "/srv/data/anaconda3/envs/prodigy/lib/python3.7/site-packages/toolz/itertoolz.py", line 234, in interleave
    yield next(itr)
  File "cython_src/prodigy/models/ner.pyx", line 292, in __call__
  File "cython_src/prodigy/models/ner.pyx", line 280, in get_tasks
  File "cython_src/prodigy/models/ner.pyx", line 253, in prodigy.models.ner.EntityRecognizer.__call__.get_tasks.sort_by_entity
KeyError: 'start'

The same input file works fine in ner.manual mode, like this:

prodigy ner.manual my_dataset_name blank_spacy_model_ner_en datasetfile.jsonl --label EVENTDATE,LOCATION

I get the same error when using the en_core_web_sm model instead of my custom blank one.
Any idea what's going on, and how to fix this?

Thanks for the report! I just had a quick look and it seems like you’re hitting a small bug in a very specific code path and I have no idea how you end up there :thinking:

Does your datasetfile.jsonl contain any pre-defined "spans" by any chance? Or is it all just entries with a "text"?

Thanks for the quick response. Yes, the dataset contains pre-defined spans, of the types I’m trying to teach (EVENTDATE,LOCATION). They contain the same text as the content of my patterns.jsonl file.

I’ll try removing the spans and see if that fixes the problem.
EDIT: Yes, it works after removing the spans. Would be great if you could either ignore existing spans, or throw an error message that’s easier to understand :wink:

Thanks for updating – looks like this was the problem then! Prodigy should already ignore pre-defined spans, but I think what happened in your case was that it first tried to make sure all "spans" contain a "text", and that’s where it hit a bug. I’ve already fixed this internally and the fix should be included in the next release :slightly_smiling_face: