I’m actually having a bit of trouble figuring this out. In theory it should be simple, but I can’t seem to find exactly what we need. (Incidentally this is why I hate inheritance…I alway feel like it gives me too many places to look).
On the spaCy side, the three key data members are:
nlp.vocab.vectors.data a numpy array with the vector data.
nlp.vocab.vectors.key2row: A dict mapping string hashes to rows in the vector table.
spacy.strings.StringStore, mapping hashes to strings.
I think we want to create a
gensim.models.keyedvectors.WordEmbeddingsKeyedVectors object. Its superclass
self.vectors = , but I doubt it’s really a list when the class is used. I thought I’d be able to look at the save and load code, but that’s in another superclass (
utils.SaveLoad), and I’m having trouble chasing down how that works.
My guess is that we’ll be able to replace that
self.vectors with the numpy array, and then load the keys into the
I can’t find an API method that does what we want, but maybe I’m not looking in the right place. The API reference is organized by class, so to get the complete docs for a given class, you have to visit the docs for the superclasses, and remember which methods are overridden. I know RaRe are working on a new docs system to address this, which I’m sure will be done soon — they’ve been pushing lots of great updates to Gensim lately.