Hi, I just bought prodigy and am having a little trouble getting started.
prodigy terms.teach terms_test en_core_web_sm
But not this:
prodigy terms.teach terms_test en_core_web_sm --seeds “great, excellent”
Whenever I add seed terms I get the following warning repeated and see an error message on the web app:
UserWarning: [W008] Evaluating Lexeme.similarity based on empty vectors.
I’ve tried using en_core_web_sm, en_core_web_lg, and en_vectors_web_lg. I’m using spacy 2.1.4 and prodigy 1.8.1. Any suggestions?
If you’re using the
en_core_web_sm model, that warning is expected, because that model doesn’t have any word vectors. For all other models, it usually means that one or more seed terms you’re trying to add are not in your vectors table. However, it does seem unlikely given the terms “great” and “excellent”.
What do the suggestions look like when you use a model like
en_vectors_web_lg and you go through them in the app? Do they make sense?
Edit: Okay, I can reproduce this with the large vector model. It seems like for some reason, spaCy thinks that the underlying lexemes it’s comparing (seed terms vs. model vocab) do not have vectors. So the similarity comparison doesn’t produce any results, and the user warning is shown (which is really just a warning, because spaCy does allow empty similarity comparisons – it’s just normally not what you want and a very helpful warning in cases like this).
Thanks for the quick follow up. It isn’t just a warning though, I can’t get terms.teach to work at all if I use seeds.
When I use en_core_web_sm I get the following error:
ValueError: [E010] Word vectors set to length 0. This may be because you
don’t have a model installed or loaded, or because your model doesn’t
include word vectors. For more info, see the docs:
(in case it helps diagnose, this error doesnt show until I actually try and open the web app in my browser.)
And when I use en_core_web_lg, I get the user warning mentioned earlier (repeated ad infinitum) and the web app remains stuck ‘Loading…’
Edit I forgot to answer your suggestion! en_vectors_web_lg shows the same warning as en_core_web_lg and also gets stuck at loading screen w warning repeating
en_core_web_sm error is expected – this is spaCy telling you that the model doesn’t have word vectors. If you use a model that has word vectors, you won’t see that error – but for some reason, spaCy thinks it’s comparing empty vectors, so you’ll see a warning and there are no suggestions.
Anyway, we’ll hopefully have fix for this today!
Just implemented the fix for this, which should be released today. I’ve implemented an extra check in the recipe to avoid iterating over words which don’t have a vector, to avoid the warning spam.
In the meantime, you can set the environment variable
SPACY_WARNING_IGNORE="W008" to prevent the warnings from coming up. The warnings are really the only problem here: with so many warnings printed, the loop runs fairly slowly, which is why you’re not seeing anything coming up. If you suppress the warnings and let the similarity function run, you should see results.
Just released v1.8.2, which adds a fix to the recipe to automatically skip words with no vectors. This prevents the warning from being raised. So it should now also work as expected without the environment variable
Thanks yall. 1.8.2 works for me.
I'm running into this when I try to use
en_core_web_trf instead of
en_core_web_lg -- is this expected?
trf model doesn't have any word vectors, so it won't work with the current
terms.teach workflow that iterates over the model's vocab and uses word vectors to suggest similar terms.