Failure to Create New Model

I'm trying to start training a blank model, but I'm blocked by being able to init a new model. I'm on Prodigy 1.7.1 (due to license) and Spacy 2.0.17 (If I read correctly > 2.0.17 models are incompatible with 1.7.1) and when trying to initialize a new model I get a

Creating model...
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/spacy/__main__.py", line 31, in <module>
    plac.call(commands[command], sys.argv[1:])
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/spacy/cli/init_model.py", line 51, in init_model
    nlp = create_model(lang, probs, oov_prob, clusters, vectors_data, vector_keys, prune_vectors)
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/spacy/cli/init_model.py", line 93, in create_model
    for word in vector_keys:
TypeError: 'NoneType' object is not iterable

I've tried initializing a new model, removing an NER pipe from an existing model and initializing a new model through nlp = spacy.blank('en') then saving to disk. Any guidance on how to either download a blank model that is compatible or initialize one without failing.

Are you running spaCy's init-model CLI command here? And what are your inputs? It looks like you have None vectors in the vectors data you're initializing with.

If you just want to save out a blank model, you won't need to run that command – that's really intended for creating a blank model with vectors etc. The following should be all you need:

import spacy
nlp = spacy.blank("en")  # or whichever language
nlp.to_disk("/path/to/model")

Or as a one-liner:

python -c "import spacy;spacy.blank('en').to_disk('/path/to/model')"

Just make sure to use the same spaCy version you're using in your Prodigy environment to prevent conflicts.

Thanks for correcting me on the Spacy init-model part. I've tried both the 1 liner and the script file version and I'm still coming up empty. If I run either of the above 2 options I get a model written to disk without errors, but when I run prodigy ner.teach recipe_parse ./models/recipe_model ./data/ingredients.txt it boots up the server fine but when I load the page the console gives me

 File "cython_src/prodigy/components/sorters.pyx", line 151, in __iter__
  File "cython_src/prodigy/components/sorters.pyx", line 61, in genexpr
  File "cython_src/prodigy/models/ner.pyx", line 292, in __call__
  File "cython_src/prodigy/models/ner.pyx", line 259, in get_tasks
  File "cytoolz/itertoolz.pyx", line 1047, in cytoolz.itertoolz.partition_all.__next__
  File "cython_src/prodigy/models/ner.pyx", line 217, in predict_spans
  File "cython_src/prodigy/models/ner.pyx", line 60, in prodigy.models.ner._BatchBeam.__init__
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/spacy/language.py", line 192, in entity
    return self.get_pipe('ner')
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/spacy/language.py", line 215, in get_pipe
    raise KeyError(Errors.E001.format(name=name, opts=self.pipe_names))
KeyError: "[E001] No component 'ner' found in pipeline. Available names: ['sbd']"

Once I got that error I looked and found: Problem using a custom Word2Vec model, makes sense. I run the script again, this time adding the new pipe and get this just running the command

Traceback (most recent call last):
  File "scripts/remove_pipe.py", line 5, in <module>
    nlp.to_disk('models/recipe_model')
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/spacy/language.py", line 615, in to_disk
    util.to_disk(path, serializers, {p: False for p in disable})
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/spacy/util.py", line 503, in to_disk
    writer(path / key)
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/spacy/language.py", line 613, in <lambda>
    serializers[name] = lambda p, proc=proc: proc.to_disk(p, vocab=False)
  File "nn_parser.pyx", line 893, in spacy.syntax.nn_parser.Parser.to_disk
  File "/Users/sharkmaul/.pyenv/versions/3.6.4/lib/python3.6/site-packages/spacy/util.py", line 503, in to_disk
    writer(path / key)
  File "nn_parser.pyx", line 884, in spacy.syntax.nn_parser.Parser.to_disk.lambda3
TypeError: 'bool' object is not subscriptable

This seems related to: https://github.com/explosion/spaCy/issues/2482 (?)

Also while researching I saw a few posts that said weights needed to be initialized w/ begin_training

I tried:

nlp = spacy.blank('en')
nlp.begin_training()
nlp.to_disk('models/recipe_model'

But got Warning: Unnamed vectors -- this won't allow multiple vectors models to be loaded. (Shape: (0, 0))

I'm on:
OSX 10.14.5
Spacy 2.0.17
Prodigy 1.7.1
Python 3.6.4

That first error is expected – if you run ner.teach, your pipeline needs an ner component. Typically, that's a pretrained NER model, but you can also use a blank one (although that's going to be less effective).

If you want to create a blank model with a blank NER component, you can start off with a blank model, add the NER component, initialize its weights (by calling begin_training) and then save the result to disk.

nlp = spacy.blank("en")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
nlp.begin_training()
nlp.to_disk("models/recipe_model")

That's just a warning (and we likely fixed the cause of this in a later version of spaCy). You should be able to just ignore this, since vectors are not going to be relevant in your case.