ner.batch-train results in KeyError

ivyleavedtoadflax · December 31, 2018, 4:50pm

Hi and Happy New Year!

I’m having an issue running ner.batch-train to train a model on a dataset I annotated using prodigy. I get the following trace:

$ prodigy ner.batch-train ner_train_0.1.5 en_core_web_sm
Traceback (most recent call last):
  File "/home/matthew/.pyenv/versions/3.6.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/matthew/.pyenv/versions/3.6.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/matthew/.virtualenvs/prodigy_utils/lib/python3.6/site-packages/prodigy/__main__.py", line 259, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 253, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/matthew/.virtualenvs/prodigy_utils/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/matthew/.virtualenvs/prodigy_utils/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/matthew/.virtualenvs/prodigy_utils/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 426, in batch_train
    examples = list(split_sentences(model.orig_nlp, examples))
  File "cython_src/prodigy/components/preprocess.pyx", line 38, in split_sentences
  File "cython_src/prodigy/components/preprocess.pyx", line 143, in prodigy.components.preprocess._add_tokens
KeyError: 73

The annotations contain a mix of 3 standard NER entities from spaCy, and one new one that I have labelled from scratch. Could this be a tokenisation error?

Info about spaCy

spaCy version      2.0.18         
prodigy version    1.6.1
Location           /home/matthew/.virtualenvs/prodigy_utils/lib/python3.6/site-packages/spacy
Platform           Linux-4.15.0-42-generic-x86_64-with-debian-buster-sid
Python version     3.6.6

ivyleavedtoadflax · January 1, 2019, 8:31pm

Ahh just realised that using --unsegmented solves the problem.

ines · January 2, 2019, 4:47pm

Thanks for the update and sorry about that – I’m pretty sure we already fixed this problem for the upcoming release!

Topic		Replies	Views
KeyError: 'token_end' when trying to use ner.batch-train ner , done	9	859	June 7, 2019
KeyError: 'text' when using ner.batch-train usage , ner , solved	6	938	February 13, 2019
Error using synthetic dataset usage , ner , solved	2	666	January 4, 2019
Segmentation fault when using ner.batch-train done , spacy	1	509	June 12, 2018
Recipe ner.batch-train results in ValueError: [E030] usage , ner , spacy , solved	10	2446	June 25, 2019

ner.batch-train results in KeyError

Related topics