Help with training from scratch english NER model with pretrained Gensim vectors

Jhutton1121 · January 19, 2022, 9:25pm

I am trying to train a from scratch NER model with custom labels. I have word vectors that are pretrained from Gensim on a large corpus of r/wallstreetbets data. I need help determining the workflow from here to creating a preliminary model.

Right now I have the vectors as well as a labeled dataset containing 270 examples in my database. I'd like to train a model using my word vectors that will then be used with the ner.correct recipe.

Any help on the steps to do this is appreciated.

adriane · January 27, 2022, 8:56am

Hi! If you have your vectors exported in word2vec text format from gensim (save_word2vec_format), you can initialize a base model with spacy init vectors:

python -m spacy init vectors en /path/to/vectors.vec /path/to/spacy_vectors

Then use /path/to/spacy_vectors as the base model when training with prodigy:

prodigy docs (for --base-model): https://prodi.gy/docs/recipes#training

spacy docs (for spacy init vectors): https://spacy.io/usage/linguistic-features#adding-vectors

adriane · January 27, 2022, 9:04am

Ah, wait, I was wrong about the prodigy side of things. Only using --base-model doesn't actually enable the vectors in the new ner component while training in prodigy by default. Let me have a look...

Edited to add:

One option is to generate a config with vectors using spacy init config -o accuracy and the set the vectors location in a prodigy train override:

spacy init config -l en -p ner -o accuracy /path/to/config.cfg
prodigy train --ner dataset --config /path/to/config.cfg --initialize.vectors /path/to/spacy_vectors

This should be easier, though...

Topic		Replies	Views
word embeddings for prodigy train recipe usage , spacy , training	8	568	October 24, 2022
How do I work with available word vectors during NER training? ner , training	3	361	June 30, 2022
Initializing custom model for ner usage , ner	1	517	January 25, 2021
pretrained tok2vec weights - prodigy v 1.11 bug , ner , spacy	5	736	October 21, 2021
Loading fasttext vectors to spacy/prodigy ner , spacy , solved	9	1544	February 13, 2022

Help with training from scratch english NER model with pretrained Gensim vectors

Related topics