@honnibal,
HI Matthew, yes I am running a local fork of Spacy since I need some custom tokenization for my work. I found that (or did not find any information, for that matter) Spacy’s regex matching capabilities when you define a TOKEN_MATCH
is restricted to only a single regex. I have a requirement where I need to identify a number of patterns, so I just modified tokenizer.pyx
in Spacy to account for an iterable being passed to TOKEN_MATCH
. I should say that I comment this line and go back to the original when I need to use Prodigy because it does not work with that change (obviously). That is the reason behind using a local fork of Spacy.
Coming to the problem, I changed the installation of Spacy to a local installation in the same Python VE where Prodigy is running and I get the same error. Also cleared out PYTHONPATH
.
Traceback (most recent call last):
File "test_code.py", line 3, in <module>
nlp = spacy.load('/tmp/model')
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/spacy/__init__.py", line 21, in load
return util.load_model(name, **overrides)
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/spacy/util.py", line 116, in load_model
return load_model_from_path(Path(name), **overrides)
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/spacy/util.py", line 156, in load_model_from_path
return nlp.from_disk(model_path)
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/spacy/language.py", line 647, in from_disk
util.from_disk(path, deserializers, exclude)
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/spacy/util.py", line 511, in from_disk
reader(path / key)
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/spacy/language.py", line 643, in <lambda>
deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
File "pipeline.pyx", line 643, in spacy.pipeline.Tagger.from_disk
File "/Users/Abhishek/Projects/Python-Projects/Python-VEs/NM-NLP/lib/python3.6/site-packages/spacy/util.py", line 511, in from_disk
reader(path / key)
File "pipeline.pyx", line 625, in spacy.pipeline.Tagger.from_disk.load_model
File "pipeline.pyx", line 534, in spacy.pipeline.Tagger.Model
ValueError: [T008] Bad configuration of Tagger. This is probably a bug within spaCy. We changed the name of an internal attribute for loading pre-trained vectors, and the class has been passed the old name (pretrained_dims) but not the new name (pretrained_vectors).
It would be extremely helpful if you could point me to any place I can write a custom tokenizer with multiple pattern matching capabilities, so that I can avoid this problem entirely.