Loading gensim word2vec vectors for terms.teach?

honnibal · February 23, 2018, 1:58pm

You should definitely be able to load your pre-trained vectors. I’m not sure the code in that StackOverflow thread refers to the current version.

Fundamentally you can always add vectors to spaCy as follows. Let’s say you have a list of word strings, and some sequence of vectors. You can do:

nlp.vocab.reset_vectors(shape=shape)
for i, string in enumerate(word_strings):
    nlp.vocab.set_vector(string, vectors[i])

This might be slow for a large number of vectors, but you should only have to do it this way once. After loading in your vectors, you can save out the nlp object with nlp.to_disk(). Then you can pass that directory to Prodigy.

If you’re using pre-trained vectors, take care not to use the md or lg spaCy data packs. These models use the pre-trained GloVe vectors as features. If you use your own pre-trained vectors, the activations will be different for what the model expects, and you’ll get terrible results. The sm model doesn’t use pre-trained vectors, to make it easy to swap in your own.

You might also be interested in the terms.train-vectors recipe. This uses Gensim to train on a text corpus, and saves out the model for use with spaCy. It should serve as a working example of how that’s done.

Topic		Replies	Views
biomedical nlp models in spacy usage , spacy , solved , gensim	4	2400	February 28, 2018
Add vectors to nlp model using terms.train-vectors terms , solved	4	1294	April 10, 2018
Help with training from scratch english NER model with pretrained Gensim vectors usage , ner , spacy	2	643	January 27, 2022
Word vectors: How do they work? usage	1	1435	April 8, 2018
Web UI for pre-trained Chinese vectors spacy , terms	6	1548	August 22, 2018

Loading gensim word2vec vectors for terms.teach?

Related topics