Prodigy sense2vec.teach recipe with gensim w2vec

fsa · March 4, 2021, 1:42pm

Hi,

based on the sense2vec.teach

prodigy sense2vec.teach food_terms data/s2v_reddit_2015_md/ --seeds "garlic, avocado, cottage cheese, olive oil, cumin, chicken breast,beef, iceberg lettuce"

I would like to use gensim w2vec generated model to generate similar words/phrases to the given seed words/phrases

In order to make this smoothly work, the gensim w2vec model should has the same format as the "s2v_reddit_2015_md/" model? if so how to convert the gensim w2vec model format to be compatible with the sense2vec model? or is there any other way to achieve this goal?

ines · March 5, 2021, 12:14am

Hi! A sense2vec model is essentially just a word2vec model trained on words/phrases with concatenated POS tags or entity labels. But in order to query that, the sense2vec library includes various methods so you can look up words with tags etc.

If you just have a regular w2v model, using it via the sense2vec library doesn't make that much sense and a lot of the assumptions in the sense2vec.teach workflow don't really hold up either because your vectors couldn't be queried by tags. I think a better solution would be one of the following:

Add your word vectors to a blank spaCy pipeline and use it with terms.teach.
Write your own recipe script that loads your vectors, calculates the average for the vectors of all seed terms and then find the most similar entries in your table and send them out. See here for the most_similar implementation in spaCy.

fsa · March 5, 2021, 9:47am

Thanks a lot

This what I was looking for. I have a complicated scientific text and using sense2vec provided poor result. POS tag doesn't work well.

ines · March 6, 2021, 3:44am

Glad to hear! And yeah, the sense2vec vectors we trained were trained on Reddit text, which is pretty far from scientific texts

Topic		Replies	Views
Obtain a list of similar words from my own trained model ner , spacy , off-topic	1	482	September 3, 2020
Similarity spacy , gensim	2	1863	March 3, 2018
custom sense2vec usage	5	1421	August 15, 2021
replace sens2vec with transformer model from hugging face usage , terms , transformers	7	1227	May 13, 2020
Workflow re: Custom Sense2Vec on New Data ner , textcat , spacy	10	2730	April 20, 2020

Prodigy sense2vec.teach recipe with gensim w2vec

Related topics