Can't find model 'en_vectors_web_lg'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Hello Prodigy team,

I'm following your tutorial here

and

Could you please help me debug the error in the title? I saw you were debugging it in the past here once, but I'm wasn't able to resolve my problem.

I'm quite sure I have the model installed and linked.

I am able to do

import spacy;
spacy.load('en_core_web_lg');

but not

prodigy train ner my_data en_vectors_web_lg --init-tok2vec ./tok2vec_cd8_model289.bin --output ./tmp_model --eval-split 0.2

I also tried linking to the model directly like this:

 prodigy train ner my_data /Users/Jakub/.pyenv/versions/3.7.7/lib/python3.7/site-packages/spacy/data/en/en_core_web_lg-2.3.1 --init-tok2vec ../tok2vec_cd8_model289.bin --output ./tmp_model --eval-split 0.2
✔ Loaded model
'/Users/Jakub/.pyenv/versions/3.7.7/lib/python3.7/site-packages/spacy/data/en/en_core_web_lg-2.3.1'
Created and merged data for 18 total examples
Using 15 train / 3 eval (split 20%)
Component: ner | Batch size: compounding | Dropout: 0.2 | Iterations: 10
✔ Initializing with tok2vec weights ../tok2vec_cd8_model289.bin

.. but then I'm getting

ValueError: could not broadcast input array from shape (128) into shape (96)

Could you please help?

Best regards,
Jakub

The problem was the model path. I had to change it from

This:

/Users/Jakub/.pyenv/versions/3.7.7/lib/python3.7/site-packages/spacy/data/en/en_core_web_lg-2.3.1 

to this:

/Users/Jakub/.pyenv/versions/3.7.7/lib/python3.7/site-packages/en_core_web_lg/en_core_web_lg-2.3.1

The next issue is unfortunately the shape of data caused by

ValueError: could not broadcast input array from shape (128) into shape (96)

I noticed that some people have reported a similar problem on this forum with a common solution to use the LG model instead of SM. But as I'm using the LG model, I'm wondering what I'm doing wrong.

Solved. I was using the wrong model. Use the en_vectors_web_lg

python -m spacy download en_vectors_web_lg

For the team reference where I think it went sideways:

If you click on the model to download en_vectors_web_lg, it opens "https://spacy.io/models/en#en_vectors_web_lg", notice the anchor #en_vectors_web_lg. But the model is not on the webpage. As the LG is the key differentiator when it comes to the models, I concentrated on finding LG and discarded the remainder of the model name.

Maybe this will help to somebody.

Oh, thanks for the heads-up and sorry about the confusion! I We previously had the vectors-only models on the same page as the pretrained core models, but then moved them to the "starter models" page: English · spaCy Models Documentation This is a better fit, because the vectors are really just vectors you can train on top of and bootstrap your models with – but I guess we forgot to update the link in the README.