I would like to use gensim w2vec generated model to generate similar words/phrases to the given seed words/phrases
In order to make this smoothly work, the gensim w2vec model should has the same format as the "s2v_reddit_2015_md/" model? if so how to convert the gensim w2vec model format to be compatible with the sense2vec model? or is there any other way to achieve this goal?
Hi! A sense2vec model is essentially just a word2vec model trained on words/phrases with concatenated POS tags or entity labels. But in order to query that, the sense2vec library includes various methods so you can look up words with tags etc.
If you just have a regular w2v model, using it via the sense2vec library doesn't make that much sense and a lot of the assumptions in the sense2vec.teach workflow don't really hold up either because your vectors couldn't be queried by tags. I think a better solution would be one of the following:
Write your own recipe script that loads your vectors, calculates the average for the vectors of all seed terms and then find the most similar entries in your table and send them out. See here for the most_similar implementation in spaCy.