Misaligned dims when loading jsonl data with large model?

Consider an input file out.jsonl whose content is {"text": "foo"}.

When I run prodigy ner.teach my_data en_core_web_lg out.jsonl, I find the following error:

ValueError: shapes (2,0) and (300,128) not aligned: 0 (dim 1) != 300 (dim 0)

Am I doing something wrong? I have been following the getting started docs up to here.

I’m using prodigy==0.5.0 on OS X.

Interesting to note: it works expectedly with en_core_web_sm.

Thanks for the report and sorry about this! I think the problem might be related to this issue, which only affected the md and lg models, because only those models contain word vectors.

This should (hopefully) be fixed in spaCy v2.0.4, which we just released today. The Prodigy beta is pinned to the exact spaCy version, so you’ll have to upgrade manually:

pip install -U spacy

Thanks for the quick response! I’ve updated spaCy to 2.0.4 and I’m getting a new error:

AttributeError: 'NoneType' object has no attribute 'data'

To be clear, when you say that the Prodigy beta is pinned to the spaCy version, I suppose this means I’ll need to pull the newest version of Prodigy as well.

Is there a chance of trading in downloads of prodigy for different platforms as to not quickly exhaust the 10 download/platform limit imposed by the beta-serving site? I definitely understand if there’s not – I am just very excited to keep as current as possible until the 1.0.0 release. :slight_smile:

Edit: the updated Prodigy isn’t in sendowl yet. Goodbye, another download. :sob:

Interesting – could you post more of the stack trace for this error? I just want to make sure it’s related to the model loading and not something else.

It’s possible though that we might need to adjust something in Prodigy as well to fix the vector loading. So in the meantime, you’ll have to use the sm models – sorry!

The new spaCy version was just released today and we haven’t fully tested it with Prodigy yet – so there’s currently no version of Prodigy that’s “officially” compatible with spaCy v2.0.4. We’re working on it, though! (Actually, we’re very close to Prodigy v1.0.0 now, so the next version might even be the Prodigy stable :tada: )

Whenever we publish a new update, we usually reset the download limit for all users :blush: We can also always manually reset the downloads if you end up exhausting the limit by accident (due to a network error or something like that).

If I’ve said it once, I’ve said it a million times: you all are saints.

Here’s a script to repro:



echo '{"text": "foo"}\n' > out.jsonl

prodigy ner.teach "$DATASET_NAME" en_core_web_lg out.jsonl

I ran it with ./test.sh test 'Some dumb test' – this will give you the full stacktrace:

  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/site-packages/prodigy/__main__.py", line 238, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 130, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/site-packages/prodigy/recipes/ner.py", line 54, in teach
    model = EntityRecognizer(spacy.load(spacy_model), label=label)
  File "cython_src/prodigy/models/ner.pyx", line 139, in prodigy.models.ner.EntityRecognizer.__init__
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 297, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 155, in deepcopy
    y = copier(x, memo)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 243, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 155, in deepcopy
    y = copier(x, memo)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 218, in _deepcopy_list
    y.append(deepcopy(a, memo))
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 155, in deepcopy
    y = copier(x, memo)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 223, in _deepcopy_tuple
    y = [deepcopy(a, memo) for a in x]
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 223, in <listcomp>
    y = [deepcopy(a, memo) for a in x]
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/Users/erippeth/.virtualenvs/nlp/lib/python3.5/copy.py", line 292, in _reconstruct
    y = callable(*args)
  File "nn_parser.pyx", line 315, in spacy.syntax.nn_parser.Parser.__init__
AttributeError: 'NoneType' object has no attribute 'data'

Thanks for the detailed example. Okay, this looks like it’s related to the same issue. Prodigy’s EntityRecognizer currently keeps a copy of the original model as copy.deepcopy(nlp). A problem with the pickling meant that the vectors weren’t copied correctly. This should have been fixed in spaCy v2.0.4 – but it looks like instead of zeros, they’re now None :thinking:

So for now, you’ll have to start with the en_core_web_sm models (or any of the other sm models) – sorry about that. The good news is, we might be able to fix this with a spaCy update and without requiring an update to Prodigy.

1 Like

Beautiful! Absolutely no problem working only with the small model – beggars can’t be choosers. :sweat_smile: Let me know if I can help in any other way!

Just pushed another update to spaCy (v2.0.5) that fixes the problem! :tada: