Add vectors to nlp model using terms.train-vectors

damiano · April 9, 2018, 2:49pm

Hello,
i am going to train wordvec using terms.train-vectors i know i can load new w2v model after spacy.load but i would like to incorporate the model directly. Is that possible?

Thanks

damiano · April 9, 2018, 5:09pm

i am using the default settings. Ubuntu has killed the script. it was using 30 gb of ram!
The source is 500MB.

Should i train the model with gensim directly?

beckerfuffle · April 9, 2018, 9:17pm

Not sure if this will help you but I think I found a bug in the recipe:

damiano · April 10, 2018, 6:25am

@beckerfuffle yes! that’s the problem.

At the moment i am using gensim directly, do you know how can we integrate the w2v model inside Language ?

honnibal · April 10, 2018, 10:25am

@damiano The way we do it in terms.train-vectors should work. You can add this to your script:


nlp.vocab.reset_vectors(width=size)
for word in w2v.wv.vocab:
    nlp.vocab.set_vector(word, w2v.wv.word_vec(word))
nlp.to_disk(output_model)

This should save you a spaCy model directory, that you can then load directly. You can use an nlp object with pre-trained models, in which case you’ll get one with pre-trained models and pre-trained vectors. Make sure you’re using an sm model if that’s the case.

Topic		Replies	Views
Loading gensim word2vec vectors for terms.teach? usage , terms , solved , gensim	17	5145	August 15, 2018
How to use two .txt files one with vectors the other with words usage , spacy , solved	4	1940	May 26, 2018
Convert Gensim FastText to spaCy-readable Word2Vec format for terms.teach recipe spacy , terms , solved , gensim	4	1495	September 11, 2020
terms.train-vectors reads in entire dataset into memory done , terms	4	745	April 10, 2018
biomedical nlp models in spacy usage , spacy , solved , gensim	4	2401	February 28, 2018

Add vectors to nlp model using terms.train-vectors

Related topics