Hi All, @ines
I hope you are ok and healthy, I have two types of questions, theoretical, implementation
1- I want to know your idea how can I use word2sense for a specific corpus (A book with 7000 sentences) to add more semantic to my word vector or my custom NER
2- How should I implement, I followed this
but I lost a bit, aI could run this comment on my specific corpus
To train meaningful vectors, you typically want to use a lot of text, like 1 billion words. So your 7000 likely won't be enough. Maybe you can find other similar texts from a different source that you can use.
Once you have a sense2vec model, you can then use the vectors to find more similar terms. Not sure if "since" is a good seed term here, because there are not that many similar expressions. It works better for things like (proper) nouns.
like always, very informative! you are right. I have only around 150000 tokens, many thanks, let me suppose that I will find another corpus, can I use the prodigy comments instead of scripts for pre-processing?
is there any other usage of sense2vec that I can use with the combination of NER to expand and improve my entities?
@robertto@ines hope you all are doing well, i am struggling on training custom sense2vec model for language other than English, please help me out, i have already prepared my data as expected, but couldnt figure out to feed the data to the model, any help is appreciated thank you,
Can you share more details on what exactly isn't working for you, or the problem you're hitting?
The step-by-step scripts should show you the full end-to-end process: GitHub - explosion/sense2vec: 🦆 Contextually-keyed word vectors So if you've already prepared your data, you should be able to run those scripts in order, and the 05_export.py script will then output the trained sense2vec vectors.