Help with training from scratch english NER model with pretrained Gensim vectors

I am trying to train a from scratch NER model with custom labels. I have word vectors that are pretrained from Gensim on a large corpus of r/wallstreetbets data. I need help determining the workflow from here to creating a preliminary model.

Right now I have the vectors as well as a labeled dataset containing 270 examples in my database. I'd like to train a model using my word vectors that will then be used with the ner.correct recipe.

Any help on the steps to do this is appreciated.

Hi! If you have your vectors exported in word2vec text format from gensim (save_word2vec_format), you can initialize a base model with spacy init vectors:

python -m spacy init vectors en /path/to/vectors.vec /path/to/spacy_vectors

Then use /path/to/spacy_vectors as the base model when training with prodigy:

prodigy docs (for --base-model):

spacy docs (for spacy init vectors):

Ah, wait, I was wrong about the prodigy side of things. Only using --base-model doesn't actually enable the vectors in the new ner component while training in prodigy by default. Let me have a look...

Edited to add:

One option is to generate a config with vectors using spacy init config -o accuracy and the set the vectors location in a prodigy train override:

spacy init config -l en -p ner -o accuracy /path/to/config.cfg
prodigy train --ner dataset --config /path/to/config.cfg --initialize.vectors /path/to/spacy_vectors

This should be easier, though...