Clearer error message for mistyped dataset name

I tried to run ner.batch-train but made a typo in the dataset name and saw this error:

$ pgy ner.batch-train bogus-dataset en --output-model operator.unsegmented.model --label MY_LABEL --unsegmented
Using 1 labels: MY_LABEL

Loaded model en
Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/anaconda3/lib/python3.6/site-packages/prodigy/__main__.py", line 254, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 152, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/anaconda3/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/anaconda3/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/anaconda3/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 376, in batch_train
    examples = merge_spans(DB.get_dataset(dataset))
  File "cython_src/prodigy/models/ner.pyx", line 21, in prodigy.models.ner.merge_spans
TypeError: object of type 'NoneType' has no len()

It would be better to check if the dataset name didn’t exist and return an error message to that effect.

This is version 1.4.0.

Thanks, good point! We mostly try to do this, but looks like this one has slipped through. Will update this for the next release :blush:

Still exists. I spent 10 minutes trying to figure out what was wrong with my code when it was actually a mistyped dataset name. :man_facepalming:t2: Just bumping it up so that it does not slip through. :slight_smile: