medium & large models for prodigy 0.5.0?

Are the two larger models (en_core_web_md & en_core_web_lg) supposed to work for prodigy 0.5.0? I just installed 0.5.0 and tried to run it with the larger models, but I get the following errors, not sure what to make of it:

prodigy ner.teach en_ner_prod050 en_core_web_lg …/traindata_NER1.txt --label LOC

Traceback (most recent call last):
File “/usr/lib/python3.5/runpy.py”, line 184, in _run_module_as_main
main”, mod_spec)
File “/usr/lib/python3.5/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/prodigy/main.py”, line 238, in
controller = recipe(*args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 143, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “cython_src/prodigy/util.pyx”, line 173, in prodigy.util.suggest_view_id
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/toolz/itertoolz.py”, line 368, in first
return next(iter(seq))
File “cython_src/prodigy/components/sorters.pyx”, line 127, in iter
File “cython_src/prodigy/components/sorters.pyx”, line 53, in genexpr
File “cython_src/prodigy/models/ner.pyx”, line 215, in call
File “cython_src/prodigy/models/ner.pyx”, line 185, in get_tasks
File “cytoolz/itertoolz.pyx”, line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538)
File “cython_src/prodigy/models/ner.pyx”, line 151, in predict_spans
File “cytoolz/itertoolz.pyx”, line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538)
File “cython_src/prodigy/components/preprocess.pyx”, line 12, in split_sentences
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/spacy/language.py”, line 531, in pipe
for doc, context in izip(docs, contexts):
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/spacy/language.py”, line 554, in pipe
for doc in docs:
File “nn_parser.pyx”, line 369, in pipe
File “cytoolz/itertoolz.pyx”, line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538)
File “nn_parser.pyx”, line 369, in pipe
File “cytoolz/itertoolz.pyx”, line 1046, in cytoolz.itertoolz.partition_all.next (cytoolz/itertoolz.c:14538)
File “pipeline.pyx”, line 397, in pipe
File “pipeline.pyx”, line 402, in spacy.pipeline.Tagger.predict
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/neural/_classes/model.py”, line 161, in call
return self.predict(x)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 55, in predict
X = layer(X)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/neural/_classes/model.py”, line 161, in call
return self.predict(x)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 293, in predict
X = layer(layer.ops.flatten(seqs_in, pad=pad))
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/neural/_classes/model.py”, line 161, in call
return self.predict(x)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 55, in predict
X = layer(X)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/neural/_classes/model.py”, line 161, in call
return self.predict(x)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/neural/_classes/model.py”, line 125, in predict
y, _ = self.begin_update(X)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 372, in uniqued_fwd
Y_uniq, bp_Y_uniq = layer.begin_update(X[ind], drop=drop)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 61, in begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 176, in begin_update
values = [fwd(X, *a, **k) for fwd in forward]
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 176, in
values = [fwd(X, *a, **k) for fwd in forward]
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 258, in wrap
output = func(*args, **kwargs)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 176, in begin_update
values = [fwd(X, *a, **k) for fwd in forward]
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 176, in
values = [fwd(X, *a, **k) for fwd in forward]
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 258, in wrap
output = func(*args, **kwargs)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 176, in begin_update
values = [fwd(X, *a, **k) for fwd in forward]
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 176, in
values = [fwd(X, *a, **k) for fwd in forward]
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 258, in wrap
output = func(*args, **kwargs)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 176, in begin_update
values = [fwd(X, *a, **k) for fwd in forward]
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 176, in
values = [fwd(X, *a, **k) for fwd in forward]
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/api.py”, line 258, in wrap
output = func(*args, **kwargs)
File “/home/prodigy/virtualenv_prodigy_0.5.0/lib/python3.5/site-packages/thinc/neural/_classes/static_vectors.py”, line 67, in begin_update
dotted = self.ops.batch_dot(vectors, self.W)
File “ops.pyx”, line 299, in thinc.neural.ops.NumpyOps.batch_dot
ValueError: shapes (168,0) and (300,128) not aligned: 0 (dim 1) != 300 (dim 0)

Yes, they’re supposed to work. It looks like I made a mistake here, thanks.

OK, I see. Do you expect this be fixed within the foreseeable future, or in the next update to prodigy? Not being impatient here (OK, maybe a little :wink: ), just trying to figure out whether I should continue working with the small model or wait for the larger ones.

It’ll definitely be fixed in the next update, which should be out next week. Maybe try editing the prodigy/recipes/ner.py file, adding the following line after creating the model;

model.orig_nlp = model.nlp

I’m not 100% sure, but I think this might work around the problem for now.

I’ve inserted the statement after the lines that read:

# Create the model, using a pre-trained spaCy model.
model = EntityRecognizer(spacy.load(spacy_model), label=label)

I hope that was where you meant. Unfortunately, I get the same problem as before (sm works, but md and lg do not). The error message looks the same, except the unaligned shapes are now given as:

ValueError: shapes (294,0) and (300,128) not aligned: 0 (dim 1) != 300 (dim 0)

Don’t know if that has any significance…?

Sorry about this bug! Just pushed a new update to spaCy that fixes a problem with pickling the vectors. This is what caused the above error – Prodigy’s EntityRecognizer model keeps a copy of the original model using copy.deepcopy, but the broken pickling meant all vectors were zero.

You should be able to simply upgrade spaCy to the latest version v2.0.5 in your Prodigy environment:

pip install -U spacy

(While Prodigy is still in beta, we’ve only pinned it to exact spaCy versions to prevent compatibility issues. So if you end up reinstalling Prodigy, you’ll have to upgrade spaCy manually again until we provide a new version of Prodigy with an updated version pin.)