custom sense2vec

Hi All,
@ines
I hope you are ok and healthy, I have two types of questions, theoretical, implementation

1- I want to know your idea how can I use word2sense for a specific corpus (A book with 7000 sentences) to add more semantic to my word vector or my custom NER

2- How should I implement, I followed this

but I lost a bit, aI could run this comment on my specific corpus

python -m prodigy sense2vec.teach data_merged_v22 C:/Users/moha/Documents/Models/s2v_old --seeds "since"

her I used "since " and was looking for other cue words related to causation, however, I know that you also used sense2vecto improve NER

I would be happy if you can give me some hints mainly how can I Training your own sense2vec vectors AND how can use it to add more sematic to my model

I would be very thankful if someone can give me some ideas about my question , many thanks

If you want to train your model, the link you shared is the way to go – you need to follow the steps and run the scripts for preprocessing, and then use either FastText or GloVe to train the vectors: https://github.com/explosion/sense2vec#-training-your-own-sense2vec-vectors

To train meaningful vectors, you typically want to use a lot of text, like 1 billion words. So your 7000 likely won't be enough. Maybe you can find other similar texts from a different source that you can use.

Once you have a sense2vec model, you can then use the vectors to find more similar terms. Not sure if "since" is a good seed term here, because there are not that many similar expressions. It works better for things like (proper) nouns.

1 Like

like always, very informative! you are right. I have only around 150000 tokens, many thanks, let me suppose that I will find another corpus, can I use the prodigy comments instead of scripts for pre-processing?

is there any other usage of sense2vec that I can use with the combination of NER to expand and improve my entities?

could you have a look my other question here?

many thanks,
Bleiben Sie Gesund

@robertto @ines hope you all are doing well, i am struggling on training custom sense2vec model for language other than English, please help me out, i have already prepared my data as expected, but couldnt figure out to feed the data to the model, any help is appreciated thank you,

Can you share more details on what exactly isn't working for you, or the problem you're hitting?

The step-by-step scripts should show you the full end-to-end process: GitHub - explosion/sense2vec: 🦆 Contextually-keyed word vectors So if you've already prepared your data, you should be able to run those scripts in order, and the 05_export.py script will then output the trained sense2vec vectors.

1 Like