Using terms.teach with Japanese

jeremy · June 2, 2021, 2:19am

When I use terms.teach with English words, it works very well, very quickly suggesting to me relevant words in the web app.

e.g. using the follwing code brings up the web server and I get many food related terms

python -m prodigy dataset food_seeds_en "seeds for foods english"
python -m prodigy terms.teach food_seeds_en en_core_web_lg --seeds "carrot, spinach, pasta, soba, udon, pizza, Bibimbap, gyudon"

However, when I do the same but using the Japanese language model and Japanese language seeds, all the terms shown to me in the web app (as shown below) are in latin letters not Japanese words. Why am I getting these results? Is the language model not registering properly?

python -m prodigy dataset food_seeds "seeds for foods"
python -m prodigy terms.teach food_seeds ja_core_news_lg --seeds "人参, ほうれん草, パスタ, そば, うどん, ピザ, ビビンバ, 牛丼"

ines · June 3, 2021, 1:18pm

Hi! It looks like ths problem is that the terms.teach recipe filters the vocabulary entries it suggests like this:

lexemes = [lex for lex in stream if lex.is_alpha and lex.is_lower]

In Japanese, it turns out that all entries return False for is_lower – which delegates to Python's built-in string method islower(). Because there are no lowercase strings, you only end up seeing some really random entries in latin characters that are lowercase.

As a quick workaround, you can run prodigy stats to find the location of your Prodigy installation and edit that line in recipe/terms.py. If you take out the and lex.is_lower part of the condition, you should see Japanese suggestions from the vectors.

jeremy · June 24, 2021, 4:29am

That is super helpful, Thanks!

Topic		Replies	Views
Bad results with terms.teach terms , solved	12	2226	August 26, 2020
terms.teach not showing contextual words done , spacy , terms , solved	3	722	July 2, 2020
terms.teach hangs indefinitely with a custom word vector model terms , solved	7	824	January 7, 2019
Error when adding seed terms to terms.teach done , terms , solved	8	1988	September 5, 2021
Prodigy recipe on your github page appears to not work. Out of date? usage , terms , solved	3	571	February 17, 2020

Using terms.teach with Japanese

Related topics