Hi,
I wanted to train a japanese blank model using ner.manual command.
But I am getting encoding error . Does anything have to exported like you have mentioned for english language model?
command used
prodigy ner.manual test jap_model_vm_1 out1.txt --label ORG
out1.txt file have japanese text
Error looks like
Using 1 labels: ORG
Traceback (most recent call last):
File "/usr/local/conda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/conda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/conda3/lib/python3.6/site-packages/prodigy/__main__.py", line 259, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 178, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "cython_src/prodigy/core.pyx", line 55, in prodigy.core.Controller.__init__
File "/usr/local/conda3/lib/python3.6/site-packages/toolz/itertoolz.py", line 368, in first
return next(iter(seq))
File "cython_src/prodigy/core.pyx", line 84, in iter_tasks
File "cython_src/prodigy/components/preprocess.pyx", line 107, in add_tokens
File "/usr/local/conda3/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 117, in make_doc
return self.tokenizer(text)
File "/usr/local/conda3/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 79, in __call__
dtokens = detailed_tokens(self.tokenizer, text)
File "/usr/local/conda3/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 60, in detailed_tokens
parts = node.feature.split(',')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 0: invalid start byte
What can be the reason?