Can't find seed term

dbl · April 13, 2021, 5:40pm

Hi! I have a long list of seeds that I tried to use with sense2vec.teach. However, many of my terms didn't come up (even in the large reddit dataset). The current approach bonks at the first term not found. So I remove it from my list, then run again. Rinse & repeat until I either get tired and just take what I've got or until I make it to the end of my term list.

It would be very handy if instead of stopping at the first term not found, all terms were checked and then reported on. For example, if I use --seeds "A, B, C, D, E, F" and B-E aren't found, instead of me running 5 times, with a message in 4 of those about a single seed, I could run just twice. The first time, I'd be told ✘ Can't find seed terms: 'B', 'C', 'D', 'E'.

ines · April 14, 2021, 1:16am

Thanks, that's a good point! I think the intention here was to exit as early as possible, but I can see how this is really inconvenient in cases like this. Looking at the code again, I even wonder if we should make this a warning instead and just skip all terms that are not in the vectors, and maybe only raise and exit if there are none left.

In the meantime (and in case others come across this thread later), here's a quick script to prune a longer list of seed terms:

from sense2vec import Sense2Vec

seeds = ["A", "B", "C", "D", "E", "F"]
s2v = Sense2Vec().from_disk("/path/to/s2v")

pruned_seeds = []
for seed in seeds:
    key = s2v.get_best_sense(seed)
    if key is not None:
        pruned_seeds.append(seed)

dbl · April 14, 2021, 6:02am

Thanks, Ines! I agree that a warning here as long as there’s at least one seed left is probably the way to go. I’m definitely going to take advantage of your workaround in the meantime.

Topic		Replies	Views
Trying to re-train sense2vec	1	236	January 9, 2023
Error when adding seed terms to terms.teach done , terms , solved	8	1988	September 5, 2021
Bad results with terms.teach terms , solved	12	2226	August 26, 2020
sense2vec.teach vectors usage , solved , sense2vec	3	1052	August 12, 2021
terms.teach not showing contextual words done , spacy , terms , solved	3	722	July 2, 2020

Can't find seed term

Related topics