ner.batch-train error split_sentences after updating prodigy

Ben · October 17, 2018, 12:51am

I am about working with Version 1.6. When testing ner.batch-train an error occurs. I worked with exactly the same dataset I previously used in prodigy 1.5.1.

Initially the following warnings appear (related to spacy):

...\Python\Python36\lib\importlib_bootstrap.py:219: RuntimeWarning: cymem.cymem.Pool size changed, may indicate binary incompatibility. Expected 48 from C header, got 64 from PyObject
return f(*args, **kwds)
...\Python\Python36\lib\importlib_bootstrap.py:219: RuntimeWarning: cymem.cymem.Address size changed, may indicate binary incompatibility. Expected 24 from C header, got 40 from PyObject
return f(*args, **kwds)
...\Python\Python36\lib\importlib_bootstrap.py:219: RuntimeWarning: cymem.cymem.Pool size changed, may indicate binary incompatibility. Expected 48 from C header, got 64 from PyObject
return f(*args, **kwds)
...\Python\Python36\lib\importlib_bootstrap.py:219: RuntimeWarning: cymem.cymem.Address size changed, may indicate binary incompatibility. Expected 24 from C header, got 40 from PyObject
return f(*args, **kwds)

Then the model is loaded and after a while the following error message is displayed:

File "...\Python\Python36\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "...\Python\Python36\lib\site-packages\prodigy\recipes\ner.py", line 426, in
batch_train
examples = list(split_sentences(model.orig_nlp, examples))
File "cython_src\prodigy\components\preprocess.pyx", line 52, in split_sentences
KeyError: 'token_start'

Do you have an idea how to solve this?
Thanks a lot!

honnibal · October 17, 2018, 5:41am

Sorry about this. Over the weekend we were working to get binary wheels up for spaCy and its dependent packages, which required pushing new versions. A small change to the memory pool, cymem, introduced a version incompatibility because it’s a compile-time dependency. Unfortunately pip hasn’t been resolving the versions the way we’ve expected, which meant that the existing versions of spaCy and Prodigy would install into an inconsistent state. To fix this, we’ve had to push forward with the release of Prodigy 1.6 without as much testng as we would have liked.

I think adding the following line to your batch train recipe should work around the problem:

examples = add_tokens(nlp, examples)

We’re working on the cymem warning, as I do think it’s likely to be problematic. If you install spaCy from source with pip uninstall spacy; pip install spacy --no-binary :all: do you still see the error? Are you on a 32 bit build of Python, or a 64 bit build?

Ben · October 18, 2018, 12:16am

Thanks for the reply.
a) I am using a 64 bit build. After reinstalling spacy, I still see the error.
b) Unfortunately, the workaround might need additional lines.

Replacing the line causes following error:

File "...\Python\Python36\lib\site-packages\prodigy\recipes\ner.py", line
428, in batch_train
evals = list(split_sentences(model.orig_nlp, evals))
File "cython_src\prodigy\components\preprocess.pyx", line 52, in split_sentences
KeyError: 'token_start'

I tried to do the same for 'evals' and added:

evals = add_tokens(nlp, evals)

This causes further errors:

File "cython_src\prodigy\core.pyx", line 253, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "...\Python\Python36\lib\site-packages\plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "...\Python\Python36\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "...\Python\Python36\lib\site-packages\prodigy\recipes\ner.py", line 426, in batch_train
examples = list(split_sentences(model.orig_nlp, examples))
File "cython_src\prodigy\components\preprocess.pyx", line 52, in split_sentences
KeyError: 'token_start'

ines · October 18, 2018, 5:48am

Could you try the new version, v1.6.1?

Ben · October 18, 2018, 5:45pm

Perfect, it works again. Thanks for the quick version update!

Topic		Replies	Views
Command "ner.batch-train" returns MemoryError ner , solved	5	826	August 22, 2019
Runtime Warning and wheel support install , solved , windows	2	709	December 10, 2018
"Known Good" version of Prodigy dependencies ner , spacy	4	558	August 31, 2018
MemoryError when saving trained model textcat , solved	2	955	August 15, 2018
Prodigy annotations from older from to newer version usage , ner , spacy , solved	5	948	January 16, 2020

ner.batch-train error split_sentences after updating prodigy

Related topics