# Applying terms.teach for Chinese

Hi! Can I check for the recipe "terms.teach", does it matter if the initial seed list are in traditional or simplified chinese? Will the results differ?

Also, when i reject a term suggested by the model, does the model perform any "negative" scoring on that term so that subsequent terms suggested will take into account those irrelevant terms?

Thanks!

Welcome back @jsnleong

If you use the spaCy Chinese model (e.g. `zh_core_web_lg`) the results will differ in that for simplified Chinese you'll most likely receive simplified Chinese suggestions and the other way round, but the overall suggestions should belong to the same semantic space regardless of the variant.

This is because the simplified and traditional tokens are treated as separate tokens in training and, consequently, they are represented by separate word vectors.

To illustrate, you can run this small experiment:

``````import spacy

simplified = nlp("书") # book
traditional = nlp("書") # book

print(f"Simplified has vector: {simplified.has_vector}")
print(f"The vectors are the same: {simplified.vector_norm==traditional.vector_norm}")

# Output
# Simplified has vector: True
# Traditional has vector: True
# The vectors are the same: False
``````

Now, if you compared the output of `terms.teach` for these terms, you'd see that the exact suggestions are different, but the semantic space is the same for the simplified seed and the traditional seed:

``````simplified result translation     traditional result translation

``````

Regarding the rejected terms, yes the model takes the rejected terms into account by iteratively updating the negative_vector, which is then used to compute the similarity between the candidate term and the negative_vector - the `reject_score`. This `reject_score` is then used in computing the final score for a term using this formula:

``````score = accept_score / (accept_score + reject_score + 0.2)
``````
1 Like

Hi!

Thanks! You gave a very clear and thorough explanation

Riding on the previous point about the simplified vs trad chinese, can I assume that by using traditional chi in my initial seed list, I would be focusing more on data sources (within spaCy's lang model) trained in trad chinese?

That's correct, yes. (Glad I could help )