It looks like there’s a problem with pickling the external library that’s optional for Japanese tokenization. We hadn’t seen this before, but will definitely look into it.
In the meantime, I think the following workaround should work to avoid the problem. I haven’t tested it myself as I’m using a machine it’s hard to install mecab on currently, so apologies if a detail of this is incorrect.
from spacy.lang.ja import JapaneseTokenizer
import copyreg
def pickle_ja_tokenizer(instance):
return JapaneseTokenizer, tuple()
copyreg.pickle(JapaneseTokenizer, pickle_ja_tokenizer)
The idea here is to use the copyreg module to instruct Python on how to copy the object.