Problem using a custom Word2Vec model

maplebay · June 12, 2018, 2:45am

I am having a little difficulty using a Word2Vec model I created. The model creation seemed to work but when I try to use it I am getting an error I don’t really know how to address. The error is “KeyError: “[E001] No component ‘ner’ found in pipeline. Available names: [‘sbd’]””. I created the model from scratch as text I am using are english comments but they are so highly technical, abbreviated and lots of acronyms.

 prodigy terms.train-vectors /Users/jdalgliesh/prodigy/model  /Users/jdalgliesh/prodigy/remarks.txt
.
. All the other train-vectors output
.
.
19:20:24 - EPOCH - 2 : training on 12417098 raw words (2691458 effective words) took 131.5s, 20464 effective words/s
    19:20:24 - training on a 24834196 raw words (5382544 effective words) took 261.2s, 20607 effective words/s

      ✨  Trained Word2Vec model
      /Users/jdalgliesh/prodigy/model

    (/anaconda/envs/NER) bash-3.2$ prodigy dataset boem

      ✨  Successfully added 'boem' to database SQLite.

    (/anaconda/envs/NER) bash-3.2$ prodigy ner.teach boem /Users/jdalgliesh/prodigy/model daily_oil.json --label "Tripping, Circulating, Drilling, Cementing, Logging, Perforation, Testing, Rigging, StuckPipe, LostCirculation, Kick, Fishing, Ballooning, Waiting, casing_hole, depth" --patterns patternsActivitiesProblems.jsonl
    Using 15 labels: Tripping, Circulating, Drilling, Cementing, Logging, Perforation Testing, Rigging, StuckPipe, LostCirculation, Kick, Fishing, Ballooning, Waiting, casing_hole, depth
    Traceback (most recent call last):
      File "/anaconda/envs/NER/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/anaconda/envs/NER/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/anaconda/envs/NER/lib/python3.6/site-packages/prodigy/__main__.py", line 259, in <module>
        controller = recipe(*args, use_plac=True)
      File "cython_src/prodigy/core.pyx", line 178, in prodigy.core.recipe.recipe_decorator.recipe_proxy
      File "cython_src/prodigy/core.pyx", line 55, in prodigy.core.Controller.__init__
      File "/anaconda/envs/NER/lib/python3.6/site-packages/toolz/itertoolz.py", line 368, in first
        return next(iter(seq))
      File "cython_src/prodigy/core.pyx", line 84, in iter_tasks
      File "cython_src/prodigy/components/sorters.pyx", line 136, in __iter__
      File "cython_src/prodigy/components/sorters.pyx", line 51, in genexpr
      File "cython_src/prodigy/util.pyx", line 303, in predict
      File "/anaconda/envs/NER/lib/python3.6/site-packages/toolz/itertoolz.py", line 234, in interleave
        yield next(itr)
      File "cython_src/prodigy/models/ner.pyx", line 265, in __call__
      File "cython_src/prodigy/models/ner.pyx", line 233, in get_tasks
      File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.__next__ (cytoolz/itertoolz.c:14538)
      File "cython_src/prodigy/models/ner.pyx", line 197, in predict_spans
      File "cython_src/prodigy/models/ner.pyx", line 57, in prodigy.models.ner._BatchBeam.__init__
      File "/anaconda/envs/NER/lib/python3.6/site-packages/spacy/language.py", line 198, in entity
        return self.get_pipe('ner')
      File "/anaconda/envs/NER/lib/python3.6/site-packages/spacy/language.py", line 221, in get_pipe
        raise KeyError(Errors.E001.format(name=name, opts=self.pipe_names))
    KeyError: "[E001] No component 'ner' found in pipeline. Available names: ['sbd']"
    (/anaconda/envs/NER) bash-3.2$

ines · June 12, 2018, 8:18am

Ah, I think the problem here is that you’re using ner.teach, which expects the spaCy model to have an entity recognizer (i.e. an 'ner' pipeline component), even if it’s empty. The easiest way to fix this is to add the component to the nlp object before you save it:

ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)

To add it to your existing model, you could just to the following:

nlp = spacy.load('/Users/jdalgliesh/prodigy/model')
nlp.add_pipe(nlp.create_pipe('ner'))
nlp.to_disk('/Users/jdalgliesh/prodigy/model-with-empty-ner')

Now that ner.teach supports patterns and starting from scratch, we could consider just adding a “blank” entity recognizer automatically if none is found. I guess the reason we let Prodigy / spaCy raise an explicit error here is that loading a model without an entity recognizer is often a mistake and can lead to various other confusing side effects down the line.

maplebay · June 22, 2018, 3:27am

Thanks for the reply, I think this is getting a little above my experience level and I am going to reach out to a consultant to help me with this work. I will pass along the reply! I am a domain expert trying to break into NLP, but tools like yours are making that a lot easier.

idealley · June 22, 2018, 4:37pm

Jeff,

I had more or less the same problem

if the w2v is a bin just use Gensim to save it as txt

    from gensim.models import KeyedVectors
    w2v = KeyedVectors.load_word2vec_format('./data/PubMed-w2v.bin', binary=True)
    w2v.save_word2vec_format('./data/PubMed.txt', binary=False)

Create a spacy model

$ spacy init-model en ./folder-to-export-to --vectors-loc ./data/PubMed.txt

You now have an empty model with vectors and no NER the next steps would be
Create some GOLD from the provide spacy model on your text (if you want the already existing NER)
export this GOLD and use it to train your new model with the batch train example from the spacy website
you should now have NER added to your new model, test it and improve it (you can use it to retrain it in prodigy)

arashsa · July 3, 2018, 3:22pm

I think it’s a good idea to add this to the empty models, I would assume more people will face this issue. In the documentation for training NER I think you state somewhere that one can start with an empty language model.

arashsa · July 3, 2018, 3:44pm

@ines I tried to run your code on an existing model but got an error message.

nlp = spacy.load('/Users/jdalgliesh/prodigy/model')
nlp.add_pipe(nlp.create_pipe('ner'))
nlp.to_disk('/Users/jdalgliesh/prodigy/model-with-empty-ner')

The error message:

Traceback (most recent call last):
    File "add_ner.py", line 5, in <module>
        nlp.to_disk('../w2v_nowac_spacy_ner')
    File "/Users/arash/.pyenv/versions/prodigy/lib/python3.6/site-packages/spacy/language.py", line 621, in to_disk
        util.to_disk(path, serializers, {p: False for p in disable})
    File "/Users/arash/.pyenv/versions/prodigy/lib/python3.6/site-packages/spacy/util.py", line 503, in to_disk
        writer(path / key)
    File "/Users/arash/.pyenv/versions/prodigy/lib/python3.6/site-packages/spacy/language.py", line 619, in <lambda>
        serializers[name] = lambda p, proc=proc: proc.to_disk(p, vocab=False)
    File "nn_parser.pyx", line 892, in spacy.syntax.nn_parser.Parser.to_disk
    File "/Users/arash/.pyenv/versions/prodigy/lib/python3.6/site-packages/spacy/util.py", line 503, in to_disk
        writer(path / key)
    File "nn_parser.pyx", line 883, in spacy.syntax.nn_parser.Parser.to_disk.lambda3
TypeError: 'bool' object is not subscriptable

ines · July 3, 2018, 4:32pm

Ah, sorry – does the following work?

nlp.add_pipe(nlp.create_pipe('ner'))
nlp.begin_training()

aniruddha · August 21, 2018, 3:35pm

Thanks Ines. That helped for ner, tagger and dep!

Nick · August 24, 2018, 11:05am

This fixed the problem for me:

python3 -c "import spacy;nlp = spacy.load('/path/to/model/');nlp.add_pipe(nlp.create_pipe('ner'));nlp.begin_training();nlp.to_disk('/path/to/model/')"

Topic		Replies	Views
Initializing custom model for ner usage , ner	1	517	January 25, 2021
Error applying ner.correct to a dataset ner	4	303	February 6, 2023
Using a custom component in NER done , spacy	4	1840	February 23, 2018
Add custom NER model from prodigy to spacy pipeline - spaCy V3 usage , ner , spacy	1	339	October 6, 2022
How do I work with available word vectors during NER training? ner , training	3	359	June 30, 2022

Problem using a custom Word2Vec model

Related topics