Add vectors to nlp model using terms.train-vectors

Hello,
i am going to train wordvec using terms.train-vectors i know i can load new w2v model after spacy.load but i would like to incorporate the model directly. Is that possible?

Thanks

i am using the default settings. Ubuntu has killed the script. it was using 30 gb of ram!
The source is 500MB.

Should i train the model with gensim directly?

Not sure if this will help you but I think I found a bug in the recipe:

@beckerfuffle yes! that’s the problem.

At the moment i am using gensim directly, do you know how can we integrate the w2v model inside Language ?

@damiano The way we do it in terms.train-vectors should work. You can add this to your script:


nlp.vocab.reset_vectors(width=size)
for word in w2v.wv.vocab:
    nlp.vocab.set_vector(word, w2v.wv.word_vec(word))
nlp.to_disk(output_model)

This should save you a spaCy model directory, that you can then load directly. You can use an nlp object with pre-trained models, in which case you’ll get one with pre-trained models and pre-trained vectors. Make sure you’re using an sm model if that’s the case.

1 Like