Hello,
i am going to train wordvec using terms.train-vectors i know i can load new w2v model after spacy.load but i would like to incorporate the model directly. Is that possible?
Thanks
Hello,
i am going to train wordvec using terms.train-vectors i know i can load new w2v model after spacy.load but i would like to incorporate the model directly. Is that possible?
Thanks
i am using the default settings. Ubuntu has killed the script. it was using 30 gb of ram!
The source is 500MB.
Should i train the model with gensim directly?
Not sure if this will help you but I think I found a bug in the recipe:
@beckerfuffle yes! that’s the problem.
At the moment i am using gensim directly, do you know how can we integrate the w2v model inside Language
?
@damiano The way we do it in terms.train-vectors
should work. You can add this to your script:
nlp.vocab.reset_vectors(width=size)
for word in w2v.wv.vocab:
nlp.vocab.set_vector(word, w2v.wv.word_vec(word))
nlp.to_disk(output_model)
This should save you a spaCy model directory, that you can then load directly. You can use an nlp
object with pre-trained models, in which case you’ll get one with pre-trained models and pre-trained vectors. Make sure you’re using an sm
model if that’s the case.