Hi
I again got the same error, I follow your previous comment and make the "locale" command and everything seems fine. I also check this:
"import locale
print(locale.getlocale())"
and this also is fine. Everything is set to: “en_US.UTF-8" but I still the following error.
root@8bc7577bb360:/prodigy# locale
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
root@8bc7577bb360:/prodigy# python
Python 3.6.7 (default, Nov 16 2018, 22:39:40)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
import spacy
we use spacy 2.0.12
nlp = spacy.load('/prodigy/data/NER')
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.6/site-packages/spacy/init.py", line 15, in load
return util.load_model(name, **overrides)
File "/usr/local/lib/python3.6/site-packages/spacy/util.py", line 116, in load_model
return load_model_from_path(Path(name), **overrides)
File "/usr/local/lib/python3.6/site-packages/spacy/util.py", line 156, in load_model_from_path
return nlp.from_disk(model_path)
File "/usr/local/lib/python3.6/site-packages/spacy/language.py", line 653, in from_disk
util.from_disk(path, deserializers, exclude)
File "/usr/local/lib/python3.6/site-packages/spacy/util.py", line 511, in from_disk
reader(path / key)
File "/usr/local/lib/python3.6/site-packages/spacy/language.py", line 641, in
self.vocab.from_disk(p) and _fix_pretrained_vectors_name(self))),
File "vocab.pyx", line 376, in spacy.vocab.Vocab.from_disk
File "strings.pyx", line 215, in spacy.strings.StringStore.from_disk
File "strings.pyx", line 248, in spacy.strings.StringStore._reset_and_load
File "strings.pyx", line 130, in spacy.strings.StringStore.add
File "strings.pyx", line 21, in spacy.strings.hash_string
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf2' in position 0: surrogates not allowed