Support for Japanese NER support in spacy!

It looks like there’s a problem with pickling the external library that’s optional for Japanese tokenization. We hadn’t seen this before, but will definitely look into it.

In the meantime, I think the following workaround should work to avoid the problem. I haven’t tested it myself as I’m using a machine it’s hard to install mecab on currently, so apologies if a detail of this is incorrect.

from spacy.lang.ja import JapaneseTokenizer
import copyreg

def pickle_ja_tokenizer(instance):
    return JapaneseTokenizer, tuple()

copyreg.pickle(JapaneseTokenizer, pickle_ja_tokenizer)

The idea here is to use the copyreg module to instruct Python on how to copy the object.