ner.batch-train error split_sentences after updating prodigy

I am about working with Version 1.6. When testing ner.batch-train an error occurs. I worked with exactly the same dataset I previously used in prodigy 1.5.1.

Initially the following warnings appear (related to spacy):

…\Python\Python36\lib\importlib_bootstrap.py:219: RuntimeWarning: cymem.cymem.Pool size changed, may indicate binary incompatibility. Expected 48 from C header, got 64 from PyObject
return f(*args, **kwds)
…\Python\Python36\lib\importlib_bootstrap.py:219: RuntimeWarning: cymem.cymem.Address size changed, may indicate binary incompatibility. Expected 24 from C header, got 40 from PyObject
return f(*args, **kwds)
…\Python\Python36\lib\importlib_bootstrap.py:219: RuntimeWarning: cymem.cymem.Pool size changed, may indicate binary incompatibility. Expected 48 from C header, got 64 from PyObject
return f(*args, **kwds)
…\Python\Python36\lib\importlib_bootstrap.py:219: RuntimeWarning: cymem.cymem.Address size changed, may indicate binary incompatibility. Expected 24 from C header, got 40 from PyObject
return f(*args, **kwds)

Then the model is loaded and after a while the following error message is displayed:

File “…\Python\Python36\lib\site-packages\plac_core.py”, line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File “…\Python\Python36\lib\site-packages\prodigy\recipes\ner.py”, line 426, in
batch_train
examples = list(split_sentences(model.orig_nlp, examples))
File “cython_src\prodigy\components\preprocess.pyx”, line 52, in split_sentences
KeyError: ‘token_start’

Do you have an idea how to solve this?
Thanks a lot!

Sorry about this. Over the weekend we were working to get binary wheels up for spaCy and its dependent packages, which required pushing new versions. A small change to the memory pool, cymem, introduced a version incompatibility because it’s a compile-time dependency. Unfortunately pip hasn’t been resolving the versions the way we’ve expected, which meant that the existing versions of spaCy and Prodigy would install into an inconsistent state. To fix this, we’ve had to push forward with the release of Prodigy 1.6 without as much testng as we would have liked.

I think adding the following line to your batch train recipe should work around the problem:

examples = add_tokens(nlp, examples)

We’re working on the cymem warning, as I do think it’s likely to be problematic. If you install spaCy from source with pip uninstall spacy; pip install spacy --no-binary :all: do you still see the error? Are you on a 32 bit build of Python, or a 64 bit build?

Thanks for the reply.
a) I am using a 64 bit build. After reinstalling spacy, I still see the error.
b) Unfortunately, the workaround might need additional lines.

Replacing the line causes following error:

File “…\Python\Python36\lib\site-packages\prodigy\recipes\ner.py”, line
428, in batch_train
evals = list(split_sentences(model.orig_nlp, evals))
File “cython_src\prodigy\components\preprocess.pyx”, line 52, in split_sentences
KeyError: ‘token_start’

I tried to do the same for ‘evals’ and added:

evals = add_tokens(nlp, evals)

This causes further errors:

File “cython_src\prodigy\core.pyx”, line 253, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “…\Python\Python36\lib\site-packages\plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “…\Python\Python36\lib\site-packages\plac_core.py”, line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File “…\Python\Python36\lib\site-packages\prodigy\recipes\ner.py”, line 426, in batch_train
examples = list(split_sentences(model.orig_nlp, examples))
File “cython_src\prodigy\components\preprocess.pyx”, line 52, in split_sentences
KeyError: ‘token_start’

Could you try the new version, v1.6.1?

Perfect, it works again. Thanks for the quick version update!

1 Like