Dimension mismatch with ner.match

I’m getting a dimension mismatch when using prodigy ner.match with a custom model and a patterns file created using terms.teach and exported with terms.to-patterns.

For context, I used prodigy terms.train-vectors and the en_core_web_lg model (as a starting model) to train a new set of 400 dim vectors on a corpus of documents related to my specific task. I then trained a classifier using textcat.batch-train and added to my custom language model. My next step was to train a custom NER using a bootstraped list of seed terms. But I get this error.

Related to Issue

prodigy: 1.6.1
spacy: 2.0.16
thinc: 6.12.0

Command:
python -m prodigy ner.match linkedin ./linkedin_model/ job_descs.json --loader jsonl --patterns skills.jsonl

Stack Trace:
Exception when serving /get_questions
Traceback (most recent call last):
File “C:\Anaconda3\envs\linkedin\lib\site-packages\waitress\channel.py”, line 338, in service
task.service()
File “C:\Anaconda3\envs\linkedin\lib\site-packages\waitress\task.py”, line 169, in service
self.execute()
File “C:\Anaconda3\envs\linkedin\lib\site-packages\waitress\task.py”, line 399, in execute
app_iter = self.channel.server.application(env, start_response)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\hug\api.py”, line 423, in api_auto_instantiate
return module.hug_wsgi(*args, **kwargs)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\falcon\api.py”, line 244, in call
responder(req, resp, **params)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\hug\interface.py”, line 793, in call
raise exception
File “C:\Anaconda3\envs\linkedin\lib\site-packages\hug\interface.py”, line 766, in call
self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\hug\interface.py”, line 703, in call_function
return self.interface(**parameters)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\hug\interface.py”, line 100, in call
return __hug_internal_self._function(*args, **kwargs)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\prodigy\app.py”, line 105, in get_questions
tasks = controller.get_questions()
File “cython_src\prodigy\core.pyx”, line 109, in prodigy.core.Controller.get_questions
File “cython_src\prodigy\components\feeds.pyx”, line 56, in prodigy.components.feeds.SharedFeed.get_questions
File “cython_src\prodigy\components\feeds.pyx”, line 61, in prodigy.components.feeds.SharedFeed.get_next_batch
File “cython_src\prodigy\components\feeds.pyx”, line 130, in prodigy.components.feeds.SessionFeed.get_session_stream
File “C:\Anaconda3\envs\linkedin\lib\site-packages\toolz\itertoolz.py”, line 368, in first
return next(iter(seq))
File “C:\Anaconda3\envs\linkedin\lib\site-packages\prodigy\recipes\ner.py”, line 62, in
‘stream’: (eg for _, eg in model(stream)),
File “cython_src\prodigy\models\matcher.pyx”, line 140, in call
File “C:\Anaconda3\envs\linkedin\lib\site-packages\spacy\language.py”, line 548, in pipe
for doc, context in izip(docs, contexts):
File “C:\Anaconda3\envs\linkedin\lib\site-packages\spacy\language.py”, line 572, in pipe
for doc in docs:
File “pipeline.pyx”, line 858, in pipe
File “cytoolz/itertoolz.pyx”, line 1047, in cytoolz.itertoolz.partition_all.next
File “C:\Anaconda3\envs\linkedin\lib\site-packages\spacy\language.py”, line 746, in _pipe
for doc in docs:
File “C:\Anaconda3\envs\linkedin\lib\site-packages\spacy\language.py”, line 746, in _pipe
for doc in docs:
File “nn_parser.pyx”, line 367, in pipe
File “cytoolz/itertoolz.pyx”, line 1047, in cytoolz.itertoolz.partition_all.next
File “nn_parser.pyx”, line 367, in pipe
File “cytoolz/itertoolz.pyx”, line 1047, in cytoolz.itertoolz.partition_all.next
File “pipeline.pyx”, line 433, in pipe
File “pipeline.pyx”, line 438, in spacy.pipeline.Tagger.predict
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\neural_classes\model.py”, line 161, in call
return self.predict(x)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 55, in predict
X = layer(X)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\neural_classes\model.py”, line 161, in call
return self.predict(x)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 293, in predict
X = layer(layer.ops.flatten(seqs_in, pad=pad))
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\neural_classes\model.py”, line 161, in call
return self.predict(x)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 55, in predict
X = layer(X)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\neural_classes\model.py”, line 161, in call
return self.predict(x)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\neural_classes\model.py”, line 125, in predict
y, _ = self.begin_update(X)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 374, in uniqued_fwd
Y_uniq, bp_Y_uniq = layer.begin_update(X_uniq, drop=drop)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 61, in begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 176, in begin_update
values = [fwd(X, *a, **k) for fwd in forward]
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 176, in
values = [fwd(X, *a, **k) for fwd in forward]
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 258, in wrap
output = func(*args, **kwargs)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 176, in begin_update
values = [fwd(X, *a, **k) for fwd in forward]
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 176, in
values = [fwd(X, *a, **k) for fwd in forward]
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 258, in wrap
output = func(*args, **kwargs)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 176, in begin_update
values = [fwd(X, *a, **k) for fwd in forward]
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 176, in
values = [fwd(X, *a, **k) for fwd in forward]
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 258, in wrap
output = func(*args, **kwargs)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 176, in begin_update
values = [fwd(X, *a, **k) for fwd in forward]
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 176, in
values = [fwd(X, *a, **k) for fwd in forward]
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\api.py”, line 258, in wrap
output = func(*args, **kwargs)
File “C:\Anaconda3\envs\linkedin\lib\site-packages\thinc\neural_classes\static_vectors.py”, line 67, in begin_update
dotted = self.ops.batch_dot(vectors, self.W)
File “ops.pyx”, line 338, in thinc.neural.ops.NumpyOps.batch_dot
ValueError: shapes (12528,400) and (300,128) not aligned: 400 (dim 1) != 300 (dim 0)

So it just occurred to me that the recommended approach for training new vectors is to use the en_core_web_sm model or a blank model like what @honnibal mentions in this post

I’m going to retrain the vectors using this model and see if that solves my issue.

For additional context. If I import the model using Spacy and attempt to process text I get the same dimension error. If however, I disable the parser, tagger, and ner components while loading my custom model, I’m able to utilize the rest of the language model without the dimension issue.