word2vec model .bin

I've downloaded a pre-trained word2vec Twitter model and i was wondering if it's possible to load that instead of a sense2vec vector or en_core_web_(any_size)?
Tried passing it as input and got the following error:

  File "/home/USER/anaconda3/envs/py-earthquakes-new/lib/python3.8/site-packages/spacy/util.py", line 253, in get_model_meta
raise IOError(Errors.E053.format(path=meta_path))
OSError: [E053] Could not read meta.json from classification/word2vec_twitter_model.bin/meta.json

Thanks in advance!

Which version of spaCy are you using? I'm assuming some version of spaCy 2?

To use an external vectors format, you'll first have to convert it to a spaCy model, cf. https://spacy.io/usage/vectors-similarity#converting. Your model will have to be compatible with one of the formats supported by init-model though...

1 Like
from pathlib import Path

from gensim.models import KeyedVectors
from gensim.test.utils import get_tmpfile
import typer

import spacy
from spacy.cli.init_model import init_model


def unpack_w2v_binary(w2v_path: str):
    txt_vec_file = get_tmpfile("vec.txt")
    KeyedVectors.load_word2vec_format(w2v_path, binary=True).save_word2vec_format(
        txt_vec_file, binary=False
    )
    return txt_vec_file


def build_w2v_spacy_model(
    lang: str, output_dir: str, w2v_path: str, model_name: str
) -> spacy.language.Language:
    """Thin wrapper around SpaCy's init_model_cli command"""
    init_model(
        lang,
        Path(output_dir),
        vectors_loc=unpack_w2v_binary(w2v_path),
        model_name=model_name,
    )


if __name__ == "__main__":
    typer.run(init_model)

^ That works to basically load a spacy model with custom vectors in a bin file. I think there are some caveats with vocab compatibility, but that's a start.