ValueError: [E008] Some current components would be lost when restoring previous pipeline state.

Every time I train the model with the --binary flag I got the following error:

$ prodigy train ner subjects en_core_web_sm --n-iter 1 --dropout 0.2 --binary --output models/company_mentions_sm
✔ Loaded model 'en_core_web_sm'
Using 906 train / 226 eval (split 20%)
Component: ner | Batch size: compounding | Dropout: 0.2 | Iterations: 1
ℹ Baseline accuracy: 0.039

=========================== ✨  Training the model ===========================

#    Loss       Skip    Right   Wrong   Accuracy
--   --------   -----   -----   -----   --------
 1       2.33       0      89     138      0.392                                                                                                                                                                                                                                

Correct     89   
Incorrect   138  
Baseline    0.039             
Accuracy    0.392

Traceback (most recent call last):
  File "/Users/quetzal/.pyenv/versions/3.7.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/quetzal/.pyenv/versions/3.7.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/quetzal/.pyenv/versions/company-mentions-dataset/lib/python3.7/site-packages/prodigy/__main__.py", line 53, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 321, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/Users/quetzal/.pyenv/versions/company-mentions-dataset/lib/python3.7/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/Users/quetzal/.pyenv/versions/company-mentions-dataset/lib/python3.7/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/Users/quetzal/.pyenv/versions/company-mentions-dataset/lib/python3.7/site-packages/prodigy/recipes/train.py", line 194, in train
    disabled.restore()
  File "/Users/quetzal/.pyenv/versions/company-mentions-dataset/lib/python3.7/site-packages/spacy/language.py", line 1139, in restore
    raise ValueError(Errors.E008.format(names=unexpected))
ValueError: [E008] Some current components would be lost when restoring previous pipeline state. If you added components after calling `nlp.disable_pipes()`, you should remove them explicitly with `nlp.remove_pipe()` before the pipeline is restored. Names of the new components: ['sentencizer']

Without the binary flag, everything works just fine. I'm using prodigy 1.10.8. Could you please help me with this issue?

Hi and sorry about that, that's strange :thinking: Could you try something and see if reverting the following change here solves the problem for you?

Hi Ines,
I am using Prodigy version 1.10.7, but I when I start to train the model in --binary mode the error reappears for me:
I have already tried swapping the annot_model = ... above the line disabled = ... in the train recipe. But even that does not solve the error.
Any workarounds for this? I am trying to create a active learning pipeline.

Hi @ines Ines,
I am using Prodigy version 1.10.7, but I when I start to train the model in --binary mode the error reappears for me:
I have already tried swapping the annot_model = ... above the line disabled = ... in the train recipe. But even that does not solve the error.
Any workarounds for this? I am trying to create a active learning pipeline.

Can you try upgrading to v1.10.8? I think that version includes a change that might be relevant here.

I'm using 1.10.8, but had the same issue; manually adding the sentencizer pipeline to the model before starting binary training on it worked, though.

Thanks for the update, that's strange – but glad to hear there's a manual workaround.

The upcoming version (currently available as a nightly pre-relase) will definitely resolve this problem, since it makes the binary training workflow obsolete and now uses the same process for training from binary and manual annotations (including the ability to train from a mix of binary and manual datasets).