Prodigy recipe on your github page appears to not work. Out of date?

Hi!

When trying to use your recipe found at:

the following error repeatedly appears in the console, and words never load in the interface:

"Evaluating lexeme.similarity based on empty vectors"

I am running the recipe via:

prodigy terms.teach test1 en_vectors_web_lg --seeds "knitting, yarn, wool, jumper" -F matt-teach.py

I am hoping to use this file as a base to begin my own custom recipe changes from.

Hi! The terms.teach recipe works by iterating over the model's vocab, and the warning is shown by spaCy if one or more tokens of a similarity query do not have a vector. So basically, the en_vectors_web_lg vocab includes words without vectors, and those trigger the warnings. This also makes the similarity queries slower, so the stream takes longer to load.

The easiest solution is to just skip words with no vectors, e.g. by changing line 71 to this:

lexemes = [w for w in lexemes if w.orth not in seen and w.vector_norm]

This is probably a useful general-purpose fix we should also add to the recipe in the recipes repo.

Thanks for the reply.

The same command works absolutely perfectly via terms.teach (e.g. not pointing to a local file), using the exact same seed words and exact same Spacy model.

e.g.
prodigy terms.teach test1 en_vectors_web_lg --seeds "knitting, yarn, wool, jumper"

works quickly and without issue.

Whereas running your recipe from theGithub page causes the error. Hence I think there must have been a change that has not been applied to the Github version of terms.teach that you have listed in your Github recipes catalogue. If those recipes are there for us to use, then I think it is worth editing them accordingly.

Yes, that's the change I mentioned above :slightly_smiling_face: It's related to a change in spaCy v2.1 and we should definitely port this over to the recipes repo. Edit: Update here!

(In general, the scripts in prodigy-recipes aren't always 100% identical to the recipes shipped with the library – they're often simplified to focus on the core functionality and to make it easy to adapt them for custom recipes.)