Can't even start prodigy because of a Unicode problem

Following the first step of the tutorial – creating the database I get
$ prodigy dataset asdf3 “asdf3”
Traceback (most recent call last):
File “/teza/home/daniyar/.conda/envs/py3/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “/teza/home/daniyar/.conda/envs/py3/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/teza/home/daniyar/.conda/envs/py3/lib/python3.6/site-packages/prodigy/main.py”, line 230, in
plac.call(commands[command], arglist=args, eager=False)
File “/teza/home/daniyar/.conda/envs/py3/lib/python3.6/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/teza/home/daniyar/.conda/envs/py3/lib/python3.6/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File “/teza/home/daniyar/.conda/envs/py3/lib/python3.6/site-packages/prodigy/main.py”, line 48, in dataset
.format(set_id, DB.db_name))
File “cython_src/prodigy/util.pyx”, line 349, in prodigy.util.prints
UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘\u2728’ in position 3: ordinal not in range(128)

Whatever I try to print with prodigy.util.prints(), I get the same error:

In [1]: import prodigy
In [2]: prodigy.util.prints(‘asdf’)

UnicodeEncodeError Traceback (most recent call last)
in ()
----> 1 prodigy.util.prints(‘asdf’)

cython_src/prodigy/util.pyx in prodigy.util.prints()

UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘\u2728’ in position 3: ordinal not in range(128)

Thanks for the report! This is likely related to the locale settings – and because Prodigy prints emoji and coloured text. It’s is a good point, though, and we should probably just implement a locale_escape utility, similar to the one we now use in spaCy.

Does the following help?

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

(I don’t remember if the LANG was necessary or not, so you might even be able to leave it out.)