Convert Gensim FastText to spaCy-readable Word2Vec format for terms.teach recipe

Hey,

I'm trying to work out how to convert my custom Gensim FastText model into a spaCy-readable Word2Vec format for use in Prodigy. I've previously done this for a Gensim Doc2Vec model, however I'm not finding anything on how to achieve this for Gensim's FastText model.

Has anyone done this before?

Thanks in advance!

Darren

Hi! You should be able to use the `init-model command for this. See the section on converting vectors here:

1 Like

Hi Ines,

Thanks for the response.

The init-model documentation mentions requiring a .txt, .zip or .gzip.zip file to be able to run. However, Gensim's models are saved to .pkl and .npy files.

I considered creating a simple .txt file by iterating through my model's entire vocab and saving on each line TOKEN VECTOR VALUE1 VALUE2...... VALUEN, however it's unclear whether the vectors in the text file should begin with the vector values themseleves or have a bracket to indicate the start of a list.

Do you know which is the case?

Thanks,

Darren

Found a solution for formatting the .txt file.

TLDR: first line of the text file should contain a string of "{} {}".format(VOCAB_SIZE, NDIMS).

Sources for solution:

Thanks for your help Ines!

Darren

1 Like

Thanks for updating and glad you got it working! :smiley: