I started with actually a trained NER with 6 entities, the model was trained on a huge data set like 9 million words and over 250000 entities. Just want to check how Prodigy could improve the model and actually how good it might be on annotations we make a test with ner.teach. Strange error happen with just running ner.teach and I got the following error. keep in mind that I check the spacy model locally and it was fine. loaded in python 2.7.13 without any problem and find all the entities. I also check to upload a spacy model with empty NER and surprisingly it works, so by sure my model inside prodigy has a problem, we tested that model with different text and loaded it several times and there was not any problem. So the model locally loaded and it works super fine so I am confused where the problem might be. So please take a look and might see the issue. Here is the error:
Using 6 labels: .....
Traceback (most recent call last):
File “/usr/local/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
“main”, mod_spec)
File “/usr/local/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.6/site-packages/prodigy/main.py”, line 259, in
controller = recipe(args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 167, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “/usr/local/lib/python3.6/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/usr/local/lib/python3.6/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File “/usr/local/lib/python3.6/site-packages/prodigy/recipes/ner.py”, line 90, in teach
nlp = spacy.load(spacy_model)
File “/usr/local/lib/python3.6/site-packages/spacy/init.py”, line 15, in load
return util.load_model(name, **overrides)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 116, in load_model
return load_model_from_path(Path(name), **overrides)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 156, in load_model_from_path
return nlp.from_disk(model_path)
File “/usr/local/lib/python3.6/site-packages/spacy/language.py”, line 653, in from_disk
util.from_disk(path, deserializers, exclude)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 511, in from_disk
reader(path / key)
File “/usr/local/lib/python3.6/site-packages/spacy/language.py”, line 641, in
self.vocab.from_disk§ and _fix_pretrained_vectors_name(self))),
File “vocab.pyx”, line 376, in spacy.vocab.Vocab.from_disk
File “strings.pyx”, line 215, in spacy.strings.StringStore.from_disk
File “strings.pyx”, line 248, in spacy.strings.StringStore._reset_and_load
File “strings.pyx”, line 130, in spacy.strings.StringStore.add
File “strings.pyx”, line 21, in spacy.strings.hash_string
UnicodeEncodeError: ‘utf-8’ codec can’t encode character ‘\udcf2’ in position 0: surrogates not allowed
root@39396ecbf955:/prodigy/data#
root@39396ecbf955:/prodigy/data# prodigy ner.teach swedish_ner_finance_train NER data/text/ner-swedish-finance.txt --label EVENT
Using 1 labels: EVENT
Traceback (most recent call last):
File “/usr/local/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
“main”, mod_spec)
File “/usr/local/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.6/site-packages/prodigy/main.py”, line 259, in
controller = recipe(args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 167, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “/usr/local/lib/python3.6/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/usr/local/lib/python3.6/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File “/usr/local/lib/python3.6/site-packages/prodigy/recipes/ner.py”, line 90, in teach
nlp = spacy.load(spacy_model)
File “/usr/local/lib/python3.6/site-packages/spacy/init.py”, line 15, in load
return util.load_model(name, **overrides)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 116, in load_model
return load_model_from_path(Path(name), **overrides)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 156, in load_model_from_path
return nlp.from_disk(model_path)
File “/usr/local/lib/python3.6/site-packages/spacy/language.py”, line 653, in from_disk
util.from_disk(path, deserializers, exclude)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 511, in from_disk
reader(path / key)
File “/usr/local/lib/python3.6/site-packages/spacy/language.py”, line 641, in
self.vocab.from_disk§ and _fix_pretrained_vectors_name(self))),
File “vocab.pyx”, line 376, in spacy.vocab.Vocab.from_disk
File “strings.pyx”, line 215, in spacy.strings.StringStore.from_disk
File “strings.pyx”, line 248, in spacy.strings.StringStore._reset_and_load
File “strings.pyx”, line 130, in spacy.strings.StringStore.add
File “strings.pyx”, line 21, in spacy.strings.hash_string
UnicodeEncodeError: ‘utf-8’ codec can’t encode character ‘\udcf2’ in position 0: surrogates not allowed
root@39396ecbf955:/prodigy/data# cd NER/
root@39396ecbf955:/prodigy/data/NER# ls
accuracy.json meta.json ner tokenizer vocab
root@39396ecbf955:/prodigy/data/NER# cd …
root@39396ecbf955:/prodigy/data# prodigy ner.teach swedish_ner_finance_train NER data/text/ner-swedish-finance.txt --patterns data/pattern-files/patterns.jsonl --label
Using 6 labels: EVENT, MORTGAGE_LOAN, CUSTOMER_SITUATION, PRODUCT, COMPANY, PERSON
Traceback (most recent call last):
File “/usr/local/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
“main”, mod_spec)
File “/usr/local/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.6/site-packages/prodigy/main.py”, line 259, in
controller = recipe(args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 167, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “/usr/local/lib/python3.6/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/usr/local/lib/python3.6/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File “/usr/local/lib/python3.6/site-packages/prodigy/recipes/ner.py”, line 90, in teach
nlp = spacy.load(spacy_model)
File “/usr/local/lib/python3.6/site-packages/spacy/init.py”, line 15, in load
return util.load_model(name, **overrides)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 116, in load_model
return load_model_from_path(Path(name), **overrides)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 156, in load_model_from_path
return nlp.from_disk(model_path)
File “/usr/local/lib/python3.6/site-packages/spacy/language.py”, line 653, in from_disk
util.from_disk(path, deserializers, exclude)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 511, in from_disk
reader(path / key)
File “/usr/local/lib/python3.6/site-packages/spacy/language.py”, line 641, in
self.vocab.from_disk§ and _fix_pretrained_vectors_name(self))),
File “vocab.pyx”, line 376, in spacy.vocab.Vocab.from_disk
File “strings.pyx”, line 215, in spacy.strings.StringStore.from_disk
File “strings.pyx”, line 248, in spacy.strings.StringStore._reset_and_load
File “strings.pyx”, line 130, in spacy.strings.StringStore.add
File “strings.pyx”, line 21, in spacy.strings.hash_string
UnicodeEncodeError: ‘utf-8’ codec can’t encode character ‘\udcf2’ in position 0: surrogates not allowed
root@39396ecbf955:/prodigy/data# ls
NER pattern-files prodigy.db text
root@39396ecbf955:/prodigy/data# prodigy ner.teach swedish_ner_finance_train NER/ data/text/ner-swedish-finance.txt --patterns data/pattern-files/patterns.jsonl --label EVENT,MORTGAGE_LOAN,CUSTOMER_SITUATION,PRODUCT,COMPANY,PERSON
Using 6 labels: EVENT, MORTGAGE_LOAN, CUSTOMER_SITUATION, PRODUCT, COMPANY, PERSON
Traceback (most recent call last):
File “/usr/local/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
“main”, mod_spec)
File “/usr/local/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.6/site-packages/prodigy/main.py”, line 259, in
controller = recipe(args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 167, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “/usr/local/lib/python3.6/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/usr/local/lib/python3.6/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File “/usr/local/lib/python3.6/site-packages/prodigy/recipes/ner.py”, line 90, in teach
nlp = spacy.load(spacy_model)
File “/usr/local/lib/python3.6/site-packages/spacy/init.py”, line 15, in load
return util.load_model(name, **overrides)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 116, in load_model
return load_model_from_path(Path(name), **overrides)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 156, in load_model_from_path
return nlp.from_disk(model_path)
File “/usr/local/lib/python3.6/site-packages/spacy/language.py”, line 653, in from_disk
util.from_disk(path, deserializers, exclude)
File “/usr/local/lib/python3.6/site-packages/spacy/util.py”, line 511, in from_disk
reader(path / key)
File “/usr/local/lib/python3.6/site-packages/spacy/language.py”, line 641, in
self.vocab.from_disk§ and _fix_pretrained_vectors_name(self))),
File “vocab.pyx”, line 376, in spacy.vocab.Vocab.from_disk
File “strings.pyx”, line 215, in spacy.strings.StringStore.from_disk
File “strings.pyx”, line 248, in spacy.strings.StringStore._reset_and_load
File “strings.pyx”, line 130, in spacy.strings.StringStore.add
File “strings.pyx”, line 21, in spacy.strings.hash_string
UnicodeEncodeError: ‘utf-8’ codec can’t encode character ‘\udcf2’ in position 0: surrogates not allowed
thanks