With pretrained vectors from fasttext, I've attempted to run with the below commands:
python -m spacy init vectors en my_vectors.txt ./my_vector_dir
python -m prodigy data-to-spacy ./mydata --ner mydataset1,mydataset2,mydataset3
python -m spacy train ./mydata/config.cfg --paths.train ./mydata/train.spacy --paths.dev ./data/dev.spacy
Excerpt from my config.cfg file below, these are where I made an edit to use the custom vectors:
[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = 128
attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
rows = [5000,2500,2500,2500]
include_static_vectors = true
[initialize]
vectors = "my_vector_dir"
Running all of the above I get this error when trying to train a spacy ner model:
=========================== Initializing pipeline ===========================
[2021-10-30 23:35:46,534] [INFO] Set up nlp object from config
[2021-10-30 23:35:46,545] [INFO] Pipeline: ['tok2vec', 'ner']
[2021-10-30 23:35:46,551] [INFO] Created vocabulary
[2021-10-30 23:35:46,832] [INFO] Added vectors: my_vector_dir
[2021-10-30 23:35:46,992] [INFO] Finished initializing nlp object
ValueError: Attempt to change dimension 'nI' for model 'maxout' from 288 to 384
Please let me know if there are any available resources or if I missed something here, thanks!