custom sense2vec

robertto · March 24, 2020, 8:29pm

Hi All,
@ines
I hope you are ok and healthy, I have two types of questions, theoretical, implementation

1- I want to know your idea how can I use word2sense for a specific corpus (A book with 7000 sentences) to add more semantic to my word vector or my custom NER

2- How should I implement, I followed this

but I lost a bit, aI could run this comment on my specific corpus

python -m prodigy sense2vec.teach data_merged_v22 C:/Users/moha/Documents/Models/s2v_old --seeds "since"

her I used "since " and was looking for other cue words related to causation, however, I know that you also used sense2vecto improve NER

I would be happy if you can give me some hints mainly how can I Training your own sense2vec vectors AND how can use it to add more sematic to my model

robertto · March 26, 2020, 12:19pm

I would be very thankful if someone can give me some ideas about my question , many thanks

ines · March 26, 2020, 8:13pm

If you want to train your model, the link you shared is the way to go – you need to follow the steps and run the scripts for preprocessing, and then use either FastText or GloVe to train the vectors: https://github.com/explosion/sense2vec#-training-your-own-sense2vec-vectors

To train meaningful vectors, you typically want to use a lot of text, like 1 billion words. So your 7000 likely won't be enough. Maybe you can find other similar texts from a different source that you can use.

Once you have a sense2vec model, you can then use the vectors to find more similar terms. Not sure if "since" is a good seed term here, because there are not that many similar expressions. It works better for things like (proper) nouns.

robertto · March 26, 2020, 8:32pm

like always, very informative! you are right. I have only around 150000 tokens, many thanks, let me suppose that I will find another corpus, can I use the prodigy comments instead of scripts for pre-processing?

is there any other usage of sense2vec that I can use with the combination of NER to expand and improve my entities?

could you have a look my other question here?

many thanks,
Bleiben Sie Gesund

henokDES · August 14, 2021, 1:07pm

@robertto @ines hope you all are doing well, i am struggling on training custom sense2vec model for language other than English, please help me out, i have already prepared my data as expected, but couldnt figure out to feed the data to the model, any help is appreciated thank you,

ines · August 15, 2021, 1:01am

Can you share more details on what exactly isn't working for you, or the problem you're hitting?

The step-by-step scripts should show you the full end-to-end process: GitHub - explosion/sense2vec: 🦆 Contextually-keyed word vectors So if you've already prepared your data, you should be able to run those scripts in order, and the 05_export.py script will then output the trained sense2vec vectors.

Topic		Replies	Views
Workflow re: Custom Sense2Vec on New Data ner , textcat , spacy	10	2796	April 20, 2020
sense2vec training questions ner , spacy , sense2vec	2	589	April 13, 2022
sense2vec: updated library, new vectors, tutorial for bootstrapping NER models, more Prodigy recipes & open-source datasets project , news	0	1023	November 26, 2019
Obtain a list of similar words from my own trained model ner , spacy , off-topic	1	511	September 3, 2020
Prodigy sense2vec.teach recipe with gensim w2vec usage , spacy , terms , solved , sense2vec	3	658	March 6, 2021

custom sense2vec

Related topics