How is the support for Languages other than English?


(Wayne Qiu) #1

First of all, really nice work!

I am curious about the support for languages other than English, especially for CJK languages?

I couldn’t find any clue about that from the online demo.

Thanks in advance!


Multilingual support?
Can it work on Traditional Chinese or Simplified Chinese?
Is it possible to let a model learn segmentation?
(Matthew Honnibal) #2

Prodigy uses spaCy for NLP by default, although you can also change this, and write recipes that use any other NLP library instead.

We don’t have pre-trained NER models for CJK languages in spaCy yet, but we have segmentation for Chinese and Japanese based on third-party libraries. For text classification, I would expect everything to work fine.

I would suggest giving the CJK support in spaCy a try. If you find that works OK, you’ll probably find Prodigy works well too.

(Ines Montani) #3

To add to @honnibal’s comment above, here’s a thread that shows an example of using Prodigy with languages that spaCy doesn’t yet provide pre-trained models for (in this case, to train a Norwegian text classifier):

And this thread discusses using Prodigy to add NER and text classification capabilities to a Chinese spaCy model (which, according to the user, seems to have worked well):

Turkish language that spaCy doesn’t yet provide pre-trained models