Do word vectors have effect on NER accuracy?

daniel · March 27, 2018, 2:28pm

Do word vectors have any effect on the accuracy of NER?

If they don’t, how do I remove vectors from existing model that has them?

If they do, does it make sense to replace the vectors with ones created only from my input data? How?

I see noticeable differences with NER accuracy between en_core_web_sm, en_core_web_md and en_core_web_lg models (en_core_web_md outperforms the others). And if the vectors are not the cause, I’d like to remove them to reduce model size and loading time. Or try to use vectors that are relevant to my data, if they have any effect.

Thanks.

honnibal · March 27, 2018, 3:27pm

The vectors are used as features if present, yes — so training vectors on your own data should be helpful. The best way to achieve this is with the spacy init-model command, which accepts a word vectors file in word2vec or FastText’s plain text format. You might try the FastText vectors from here: https://fasttext.cc/docs/en/english-vectors.html

The en_core_web_md model has the same initial vectors as en_core_web_lg, but only keeps the rows for the 20k most frequent words in the vocab. All other words are mapped to their nearest neighbour within those frequent words. This works pretty well: the top 20k most frequent words should cover more than 95% of the tokens in the text, and many other tokens still get some representation. You can activate this setting with the --prune-vectors flag on spacy init-model.

Topic		Replies	Views
Will custom word vectors improve NER training on new entities? usage , ner , spacy , solved	1	375	November 20, 2020
Custom Word Vectors usage , spacy , solved	1	450	February 24, 2020
German NER model usage , spacy	3	787	November 26, 2020
How do I work with available word vectors during NER training? ner , training	3	361	June 30, 2022
Help with training from scratch english NER model with pretrained Gensim vectors usage , ner , spacy	2	645	January 27, 2022

Do word vectors have effect on NER accuracy?

Related topics