Convert Gensim FastText to spaCy-readable Word2Vec format for terms.teach recipe

DGMS90 · September 10, 2020, 9:24am

Hey,

I'm trying to work out how to convert my custom Gensim FastText model into a spaCy-readable Word2Vec format for use in Prodigy. I've previously done this for a Gensim Doc2Vec model, however I'm not finding anything on how to achieve this for Gensim's FastText model.

Has anyone done this before?

Thanks in advance!

Darren

ines · September 10, 2020, 1:56pm

Hi! You should be able to use the `init-model command for this. See the section on converting vectors here:

DGMS90 · September 10, 2020, 2:51pm

Hi Ines,

Thanks for the response.

The init-model documentation mentions requiring a .txt, .zip or .gzip.zip file to be able to run. However, Gensim's models are saved to .pkl and .npy files.

I considered creating a simple .txt file by iterating through my model's entire vocab and saving on each line TOKEN VECTOR VALUE1 VALUE2...... VALUEN, however it's unclear whether the vectors in the text file should begin with the vector values themseleves or have a bracket to indicate the start of a list.

Do you know which is the case?

Thanks,

Darren

DGMS90 · September 11, 2020, 6:03am

Found a solution for formatting the .txt file.

TLDR: first line of the text file should contain a string of "{} {}".format(VOCAB_SIZE, NDIMS).

Sources for solution:

Thanks for your help Ines!

Darren

ines · September 11, 2020, 10:11am

Thanks for updating and glad you got it working!

Topic		Replies	Views
Loading gensim word2vec vectors for terms.teach? usage , terms , solved , gensim	17	5147	August 15, 2018
How to use two .txt files one with vectors the other with words usage , spacy , solved	4	1940	May 26, 2018
Loading fasttext vectors to spacy/prodigy ner , spacy , solved	9	1544	February 13, 2022
Using Fastext vector model in Prodigy? usage , spacy , solved	7	3403	March 15, 2018
biomedical nlp models in spacy usage , spacy , solved , gensim	4	2401	February 28, 2018

Convert Gensim FastText to spaCy-readable Word2Vec format for terms.teach recipe

Related topics