Hi.
I’m trying to add training data I’ve programmatically converted from another source. The results of the conversion are lines in a JSONL file that have this format:
{"text": "The Iowa medical examiner's office said a woman killed by a train at an Ames crossing in March committed suicide.", "spans": [{"start": 9, "end": 25, "label": "JOB"}]}
Adding them to an existing session looks to work OK:
(prodigy) eb% prodigy db-in wiki_test auto_training.jsonl
✨ Imported 615 annotations for 'wiki_test' to database SQLite
Added 'accept' answer to 615 annotations
Session ID: 2019-04-10_15-16-12
But if I try to train, it collapses into farts:
(prodigy) eb% prodigy ner.batch-train wiki_test en_core_web_lg --output model --label JOB
Using 1 labels: JOB
Loaded model en_core_web_lg
Using 20% of accept/reject examples (898) for evaluation
Traceback (most recent call last):
File "/misc/pyenv/versions/3.6.8/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/misc/pyenv/versions/3.6.8/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/misc/pyenv/versions/prodigy/lib/python3.6/site-packages/prodigy/__main__.py", line 331, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 211, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/misc/pyenv/versions/prodigy/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/misc/pyenv/versions/prodigy/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/misc/pyenv/versions/prodigy/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 521, in batch_train
examples = list(split_sentences(model.orig_nlp, examples))
File "cython_src/prodigy/components/preprocess.pyx", line 37, in split_sentences
File "cython_src/prodigy/components/preprocess.pyx", line 164, in prodigy.components.preprocess._add_tokens
KeyError: 89
I’m stuck now cos I can’t retrain.