sense2vec.teach vectors

dsr2021 · August 10, 2021, 1:15am

I'm trying to use sense2vec.teach for the first time. I'm using prodigy nightly 1.11.0a11. Per the instructions in https://prodi.gy/docs/recipes#terms-teach I used en_core_web_lg as the vectors file:

python -m prodigy sense2vec.teach my_data_set en_core_web_lg --seeds prodigy_data\seeds.txt

However, I get the following:

raise ValueError(f"Can't read file: {location}")
ValueError: Can't read file: en_core_web_lg\cfg

Also, if I try to download en_vectors_web_lg, I get an HTTP 404 error, presumably because the spacy version is 3.1.1

ines · August 11, 2021, 12:27am

Hi! The sense2vec.teach recipe takes the path to trained sense2vec vectors, not a regular spaCy pipeline. So you want to be downloading one of the pretrained vector packages here and use the path to that instead: GitHub - explosion/sense2vec: 🦆 Contextually-keyed word vectors There's also an example command lower down in the docs.

dsr2021 · August 12, 2021, 2:33am

OK. here's what I want to do, and maybe sense2vec isn't the solution. I would like to create patterns with terms.teach and terms.to-patterns except that terms. teach doesn't appear to handle terms with spaces ("multi word terms"?), as per the documentation (I haven't yet tried it). So I was planning to use sense2vec.teach and terms.to-patterns.. What would you suggest I should do to be able to extract patterns from datasets that contain my annotations. I have a script that extracts the text in spans per label.

I have a very specific (large) set of NEs that I would like to be recognized, that may not appear in reddit discourses. I think that the purpose of patterns is to avoid having billions of examples to train against.

ines · August 12, 2021, 12:23pm

Yes, the problem of word2vec is that it's... well, word2vec So you'll only be able to get vectors and compare similarities for individual words, not phrases. sense2vec solves this by training a model on preprocessed text that merges noun phrases and entities and includes labels or part-of-speech tags. This lets you write more specific queries and check similarities for multi-word phrases.

So it does seem like sense2vec would be a good fit for what you're doing. You just need to pick one of the available pretrained vector files and then you can load them in and find other similar terms given a list of seed terms.

Topic		Replies	Views
Problem using prodigy sense2vec.teach solved	4	317	June 14, 2022
Input pattern file to terms.teach	3	318	February 24, 2023
terms.teach not working for nightly spacy , nightly	3	538	April 25, 2021
Trying to re-train sense2vec	1	236	January 9, 2023
Prodigy sense2vec.teach recipe with gensim w2vec usage , spacy , terms , solved , sense2vec	3	605	March 6, 2021

sense2vec.teach vectors

Related topics