Hi, I met the same kind of error when using sense2vec.teach
, but the previous solution for term.teach
seems not work for it:
I set my nlp firstly from en_vectors_web_lg
, then added the pipeline of s2v_reddit_2019_lg
that I downloaded from your website:
nlp = spacy.load("en_core_web_lg")
from sense2vec import Sense2Vec, Sense2VecComponent
s2v = Sense2VecComponent(nlp.vocab).from_disk("../Prodigy_anotation/s2v_reddit_2019_lg")
then execute the command of sense2vec.teach
:
prodigy sense2vec.teach termsg06f "../Prodigy_anotation/s2v_reddit_2019_lg"
--seeds "electronic device user interface, control unit,...(about 6000 terms)"
And it returns the error as :
Traceback (most recent call last):
File "/Users/zuoyou/opt/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/zuoyou/opt/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/zuoyou/opt/anaconda3/lib/python3.7/site-packages/prodigy/__main__.py", line 53, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 331, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "cython_src/prodigy/core.pyx", line 353, in prodigy.core._components_to_ctrl
File "cython_src/prodigy/core.pyx", line 142, in prodigy.core.Controller.__init__
File "cython_src/prodigy/components/feeds.pyx", line 56, in prodigy.components.feeds.SharedFeed.__init__
File "cython_src/prodigy/components/feeds.pyx", line 155, in prodigy.components.feeds.SharedFeed.validate_stream
File "/Users/zuoyou/opt/anaconda3/lib/python3.7/site-packages/toolz/itertoolz.py", line 376, in first
return next(iter(seq))
File "/Users/zuoyou/opt/anaconda3/lib/python3.7/site-packages/sense2vec/prodigy_recipes.py", line 113, in get_stream
most_similar = s2v.most_similar(accept_keys, n=n_similar)
File "/Users/zuoyou/opt/anaconda3/lib/python3.7/site-packages/sense2vec/sense2vec.py", line 232, in most_similar
result = [(self.strings[key], score) for key, score in result if key]
File "/Users/zuoyou/opt/anaconda3/lib/python3.7/site-packages/sense2vec/sense2vec.py", line 232, in <listcomp>
result = [(self.strings[key], score) for key, score in result if key]
File "strings.pyx", line 136, in spacy.strings.StringStore.__getitem__
KeyError: "[E018] Can't retrieve string for hash '12141139319163549496'. This usually refers to an issue with the `Vocab` or `StringStore`."
I tried the previous method to remove the missing keys, but when I wanted to save the new nlp to my current directory it will return the error as:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-47993abdb9e6> in <module>
----> 1 nlp.to_disk("../Prodigy_anotation/s2v_reddit_2019_lg_fixed/")
~/opt/anaconda3/lib/python3.7/site-packages/spacy/language.py in to_disk(self, path, exclude, disable)
925 serializers[name] = lambda p, proc=proc: proc.to_disk(p, exclude=["vocab"])
926 serializers["vocab"] = lambda p: self.vocab.to_disk(p)
--> 927 util.to_disk(path, serializers, exclude)
928
929 def from_disk(self, path, exclude=tuple(), disable=None):
~/opt/anaconda3/lib/python3.7/site-packages/spacy/util.py in to_disk(path, writers, exclude)
679 # Split to support file names like meta.json
680 if key.split(".")[0] not in exclude:
--> 681 writer(path / key)
682 return path
683
~/opt/anaconda3/lib/python3.7/site-packages/spacy/language.py in <lambda>(p, proc)
923 if not hasattr(proc, "to_disk"):
924 continue
--> 925 serializers[name] = lambda p, proc=proc: proc.to_disk(p, exclude=["vocab"])
926 serializers["vocab"] = lambda p: self.vocab.to_disk(p)
927 util.to_disk(path, serializers, exclude)
TypeError: to_disk() got an unexpected keyword argument 'exclude'
I don't understand why it happens, could you please help me checking this? Thank you.