Hi,
I was wanting to try the train-curve recipe but I get the followin error when I run it:
Traceback (most recent call last):
File "C:\Users\x\AppData\Local\Programs\Python\Python37\Lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\x\AppData\Local\Programs\Python\Python37\Lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\x.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\prodigy_main.py", line 54, in
controller = recipe(args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "C:\Users\x.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "C:\Users\x.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\plac_core.py", line 232, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File "C:\Users\x.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\prodigy\recipes\train.py", line 331, in train_curve
config, gpu_id=gpu_id, overrides=overrides, silent=True
File "C:\Users\x.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\prodigy\recipes\train.py", line 172, in _train
spacy_train(nlp, output_path, use_gpu=gpu_id, stdout=stdout)
File "C:\Users\x.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\spacy\training\loop.py", line 91, in train
stdout.write(msg.info(f"Pipeline: {nlp.pipe_names}") + "\n")
File "C:\Users\x.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2139' in position 0: character maps to
I'm mostly posting this to report this issue. Since I'm not getting this error anywhere else (training, eval, ... all run fine), with the exact same dataset, I think it might be a bug or at least something that is being handled more gracefuly in other places?
Since it contains proprietary data, I won't be able to provide the dataset.