Problem with creating terminology list after upgrade to 2.1

Hi!

After updating both prodigy and spacy I cant create a terminology list as before.
I have downloaded word vector file from fasttext.

python -m spacy init-model sv vectors_sv_wiki --vectors-loc cc.la.300.vec.gz

After that I load the model together with the following command

prodigy terms.teach cities vectors_sv_wiki --seeds "broadband, internet"

It starts the annotation tool but as soon as I open the page i got the following error in the console

UserWarning: [W008] Evaluating Lexeme.similarity based on empty vectors

Can you please help me out here?

I’m not positive but I think what’s happening here is you’re using seed terms that aren’t in your vectors table. This means you’re ending up with a 0 vector as the query. Does that sound likely, or are the terms broadband and internet definitely in your vectors file?

I am more than sure that it exists in the vectors file since I’m using the same file as earlier and that one worked fine. The only difference here is the upgraded version of spacy and prodigy.

@mikael Do the suggestions you see in the app make sense or is it random? Like, do you see vaguely similar terms to “broadband” and “internet”?

No, there’s actually no suggestions at all. The app are empty and I receieve the lexeme error in the console

Thanks! And okay, there seems to be something wrong with the similarity comparison – we'll look into this. It's possible that it just requires a small fix to the recipe, which you can patch yourself in the meantime.

Also see my comment on this thread:

Thanks for the answer but it doesn’t seems to be just a warning. I have been waiting for about 20 minutes now and it still look like this.

Ah yes, what I meant was, the underlying thing that Python outputs here is a warning (UserWarning), not an error that stops the process. In other contexts, it could just mean that one word doesn’t have a word vector, which is totally fine. But in this case for some reason, spaCy thinks it’s comparing empty vectors, so you’ll see a warning and there are no suggestions. That’s why it’s stuck at “Loading…” and nothing comes up.

Anyway, we’ll hopefully have a fix for this today!

Just implemented the fix for this, which should be released today. I’ve implemented an extra check in the recipe to avoid iterating over words which don’t have a vector, to avoid the warning spam.

In the meantime, you can set the environment variable SPACY_WARNING_IGNORE="W008" to prevent the warnings from coming up. The warnings are really the only problem here: with so many warnings printed, the loop runs fairly slowly, which is why you’re not seeing anything coming up. If you suppress the warnings and let the similarity function run, you should see results.

Awesome! Thanks! It’s working now with the environment variable :slight_smile:
Thanks!

Just released v1.8.2, which adds a fix to the recipe to automatically skip words with no vectors. This prevents the warning from being raised. So it should now also work as expected without the environment variable :slightly_smiling_face: