I've been running pretraining experiments with different pretrained vectors (word2vec, fasttext, etc.) and text categorization models. Up until today, I haven't had an issue using prodigy train textcat -t2v ...
with weights produced by spacy pretrain
.
However, when attempting to train my latest experiment with the following command:
python -m prodigy train textcat pslic_textcat_dedup,pflic_textcat_dedup en_fasttext_1m -o d:/fasttext_1m_ps -n 20 -t2v ./fasttext_pretrain_model4.bin -es 0.4 -d 0.5
I get the following error:
Loaded model 'en_fasttext_1m'
Traceback (most recent call last):
File "d:\Anaconda3\envs\python37\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\Anaconda3\envs\python37\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\ChristopherRogers\AppData\Local\pypoetry\Cache\virtualenvs\jobtitles-Naq776M9-py3.8\lib\site-packages\prodigy\__main__.py", line 60, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 300, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "C:\Users\ChristopherRogers\AppData\Local\pypoetry\Cache\virtualenvs\jobtitles-Naq776M9-py3.8\lib\site-packages\plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "C:\Users\ChristopherRogers\AppData\Local\pypoetry\Cache\virtualenvs\jobtitles-Naq776M9-py3.8\lib\site-packages\plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "C:\Users\ChristopherRogers\AppData\Local\pypoetry\Cache\virtualenvs\jobtitles-Naq776M9-py3.8\lib\site-packages\prodigy\recipes\train.py", line 92, in train
read_pretrain_hyper_params(init_tok2vec, component, require=True)
File "cython_src\prodigy\util.pyx", line 573, in prodigy.util.read_pretrain_hyper_params
AttributeError: 'bytes' object has no attribute 'get'
I'm using poetry to create the following environment:
python = "^3.7"
pandas = "^1.1.0"
scipy = "^1.5.2"
numpy = "^1.19.1"
sagemaker = "^1.72.0"
s3fs = "^0.4.2"
cupy-cuda102 = "^7.7.0"
spacy = {version = "2.3.1", extras = ["cuda102", "lookups"]}
prodigy = {path = "prodigy-1.10.3-cp36.cp37.cp38-cp36m.cp37m.cp38-win_amd64.whl"}
en_fasttext_1m = {path = "en_fasttext_1m-2.3.0.tar.gz"}
Pretraining occurred on an AWS p3.2xlarge instance with the following settings:
pretrain(texts_loc, vectors_model, output_dir, n_iter=10, min_length=1, max_length=50, seed=1337,
n_save_every=1, sa_depth=2, bilstm_depth=2, width=300, dropout=0.3, batch_size=5000, conv_depth=6)
What am I missing?