Problem using a custom Word2Vec model

I am having a little difficulty using a Word2Vec model I created. The model creation seemed to work but when I try to use it I am getting an error I don’t really know how to address. The error is “KeyError: “[E001] No component ‘ner’ found in pipeline. Available names: [‘sbd’]””. I created the model from scratch as text I am using are english comments but they are so highly technical, abbreviated and lots of acronyms.

 prodigy terms.train-vectors /Users/jdalgliesh/prodigy/model  /Users/jdalgliesh/prodigy/remarks.txt
.
. All the other train-vectors output
.
.
19:20:24 - EPOCH - 2 : training on 12417098 raw words (2691458 effective words) took 131.5s, 20464 effective words/s
    19:20:24 - training on a 24834196 raw words (5382544 effective words) took 261.2s, 20607 effective words/s

      ✨  Trained Word2Vec model
      /Users/jdalgliesh/prodigy/model

    (/anaconda/envs/NER) bash-3.2$ prodigy dataset boem

      ✨  Successfully added 'boem' to database SQLite.

    (/anaconda/envs/NER) bash-3.2$ prodigy ner.teach boem /Users/jdalgliesh/prodigy/model daily_oil.json --label "Tripping, Circulating, Drilling, Cementing, Logging, Perforation, Testing, Rigging, StuckPipe, LostCirculation, Kick, Fishing, Ballooning, Waiting, casing_hole, depth" --patterns patternsActivitiesProblems.jsonl
    Using 15 labels: Tripping, Circulating, Drilling, Cementing, Logging, Perforation Testing, Rigging, StuckPipe, LostCirculation, Kick, Fishing, Ballooning, Waiting, casing_hole, depth
    Traceback (most recent call last):
      File "/anaconda/envs/NER/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/anaconda/envs/NER/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/anaconda/envs/NER/lib/python3.6/site-packages/prodigy/__main__.py", line 259, in <module>
        controller = recipe(*args, use_plac=True)
      File "cython_src/prodigy/core.pyx", line 178, in prodigy.core.recipe.recipe_decorator.recipe_proxy
      File "cython_src/prodigy/core.pyx", line 55, in prodigy.core.Controller.__init__
      File "/anaconda/envs/NER/lib/python3.6/site-packages/toolz/itertoolz.py", line 368, in first
        return next(iter(seq))
      File "cython_src/prodigy/core.pyx", line 84, in iter_tasks
      File "cython_src/prodigy/components/sorters.pyx", line 136, in __iter__
      File "cython_src/prodigy/components/sorters.pyx", line 51, in genexpr
      File "cython_src/prodigy/util.pyx", line 303, in predict
      File "/anaconda/envs/NER/lib/python3.6/site-packages/toolz/itertoolz.py", line 234, in interleave
        yield next(itr)
      File "cython_src/prodigy/models/ner.pyx", line 265, in __call__
      File "cython_src/prodigy/models/ner.pyx", line 233, in get_tasks
      File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.__next__ (cytoolz/itertoolz.c:14538)
      File "cython_src/prodigy/models/ner.pyx", line 197, in predict_spans
      File "cython_src/prodigy/models/ner.pyx", line 57, in prodigy.models.ner._BatchBeam.__init__
      File "/anaconda/envs/NER/lib/python3.6/site-packages/spacy/language.py", line 198, in entity
        return self.get_pipe('ner')
      File "/anaconda/envs/NER/lib/python3.6/site-packages/spacy/language.py", line 221, in get_pipe
        raise KeyError(Errors.E001.format(name=name, opts=self.pipe_names))
    KeyError: "[E001] No component 'ner' found in pipeline. Available names: ['sbd']"
    (/anaconda/envs/NER) bash-3.2$

Ah, I think the problem here is that you’re using ner.teach, which expects the spaCy model to have an entity recognizer (i.e. an 'ner' pipeline component), even if it’s empty. The easiest way to fix this is to add the component to the nlp object before you save it:

ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)

To add it to your existing model, you could just to the following:

nlp = spacy.load('/Users/jdalgliesh/prodigy/model')
nlp.add_pipe(nlp.create_pipe('ner'))
nlp.to_disk('/Users/jdalgliesh/prodigy/model-with-empty-ner')

Now that ner.teach supports patterns and starting from scratch, we could consider just adding a “blank” entity recognizer automatically if none is found. I guess the reason we let Prodigy / spaCy raise an explicit error here is that loading a model without an entity recognizer is often a mistake and can lead to various other confusing side effects down the line.

Thanks for the reply, I think this is getting a little above my experience level and I am going to reach out to a consultant to help me with this work. I will pass along the reply! I am a domain expert trying to break into NLP, but tools like yours are making that a lot easier.

Jeff,

I had more or less the same problem

  1. if the w2v is a bin just use Gensim to save it as txt
    from gensim.models import KeyedVectors
    w2v = KeyedVectors.load_word2vec_format('./data/PubMed-w2v.bin', binary=True)
    w2v.save_word2vec_format('./data/PubMed.txt', binary=False)
  1. Create a spacy model
$ spacy init-model en ./folder-to-export-to --vectors-loc ./data/PubMed.txt
  1. You now have an empty model with vectors and no NER the next steps would be
  2. Create some GOLD from the provide spacy model on your text (if you want the already existing NER)
  3. export this GOLD and use it to train your new model with the batch train example from the spacy website
  4. you should now have NER added to your new model, test it and improve it (you can use it to retrain it in prodigy)
2 Likes

I think it’s a good idea to add this to the empty models, I would assume more people will face this issue. In the documentation for training NER I think you state somewhere that one can start with an empty language model.

@ines I tried to run your code on an existing model but got an error message.

nlp = spacy.load('/Users/jdalgliesh/prodigy/model')
nlp.add_pipe(nlp.create_pipe('ner'))
nlp.to_disk('/Users/jdalgliesh/prodigy/model-with-empty-ner')

The error message:

Traceback (most recent call last):
    File "add_ner.py", line 5, in <module>
        nlp.to_disk('../w2v_nowac_spacy_ner')
    File "/Users/arash/.pyenv/versions/prodigy/lib/python3.6/site-packages/spacy/language.py", line 621, in to_disk
        util.to_disk(path, serializers, {p: False for p in disable})
    File "/Users/arash/.pyenv/versions/prodigy/lib/python3.6/site-packages/spacy/util.py", line 503, in to_disk
        writer(path / key)
    File "/Users/arash/.pyenv/versions/prodigy/lib/python3.6/site-packages/spacy/language.py", line 619, in <lambda>
        serializers[name] = lambda p, proc=proc: proc.to_disk(p, vocab=False)
    File "nn_parser.pyx", line 892, in spacy.syntax.nn_parser.Parser.to_disk
    File "/Users/arash/.pyenv/versions/prodigy/lib/python3.6/site-packages/spacy/util.py", line 503, in to_disk
        writer(path / key)
    File "nn_parser.pyx", line 883, in spacy.syntax.nn_parser.Parser.to_disk.lambda3
TypeError: 'bool' object is not subscriptable

Ah, sorry – does the following work?

nlp.add_pipe(nlp.create_pipe('ner'))
nlp.begin_training()
2 Likes

Thanks Ines. That helped for ner, tagger and dep!

This fixed the problem for me:

python3 -c "import spacy;nlp = spacy.load('/path/to/model/');nlp.add_pipe(nlp.create_pipe('ner'));nlp.begin_training();nlp.to_disk('/path/to/model/')"
1 Like