I also tried linking to the model directly like this:
prodigy train ner my_data /Users/Jakub/.pyenv/versions/3.7.7/lib/python3.7/site-packages/spacy/data/en/en_core_web_lg-2.3.1 --init-tok2vec ../tok2vec_cd8_model289.bin --output ./tmp_model --eval-split 0.2
✔ Loaded model
'/Users/Jakub/.pyenv/versions/3.7.7/lib/python3.7/site-packages/spacy/data/en/en_core_web_lg-2.3.1'
Created and merged data for 18 total examples
Using 15 train / 3 eval (split 20%)
Component: ner | Batch size: compounding | Dropout: 0.2 | Iterations: 10
✔ Initializing with tok2vec weights ../tok2vec_cd8_model289.bin
.. but then I'm getting
ValueError: could not broadcast input array from shape (128) into shape (96)
The next issue is unfortunately the shape of data caused by
ValueError: could not broadcast input array from shape (128) into shape (96)
I noticed that some people have reported a similar problem on this forum with a common solution to use the LG model instead of SM. But as I'm using the LG model, I'm wondering what I'm doing wrong.
Solved. I was using the wrong model. Use the en_vectors_web_lg
python -m spacy download en_vectors_web_lg
For the team reference where I think it went sideways:
If you click on the model to download en_vectors_web_lg, it opens "https://spacy.io/models/en#en_vectors_web_lg", notice the anchor #en_vectors_web_lg. But the model is not on the webpage. As the LG is the key differentiator when it comes to the models, I concentrated on finding LG and discarded the remainder of the model name.
Oh, thanks for the heads-up and sorry about the confusion! I We previously had the vectors-only models on the same page as the pretrained core models, but then moved them to the "starter models" page: English · spaCy Models Documentation This is a better fit, because the vectors are really just vectors you can train on top of and bootstrap your models with – but I guess we forgot to update the link in the README.