Out-of-vocabulary new NER model

Andrey · September 13, 2018, 3:49pm

I want to train a new NER model, using my annotated data by following the example. If I’m using a pre-trained model, for example ‘en_core_web_lg’, how can I predict out-of-vocabulary entities? I can provide a lot of training data (thousands) with various animals (i.e cat, dog, etc.)

Isn’t this pre-trained model using word vectors and can potentially identify entities similar to the training ones? I’m a bit puzzled.

honnibal · September 15, 2018, 12:12am

We’re able to learn new vocabulary items without resizing the embedding table. This is one of the big advantages of the hash embeddings used in spaCy. I explain it here: Can you explain how exactly HashEmbed works ?

Andrey · September 15, 2018, 6:29pm

Thanks Matt, I will try it.

Topic		Replies	Views
TextCat outcome depends on words that are not in the vocabulary textcat , spacy	3	778	February 11, 2020
Recommendation for creating new special token usage , spacy , solved	3	696	March 26, 2018
How do I work with available word vectors during NER training? ner , training	3	361	June 30, 2022
Will custom word vectors improve NER training on new entities? usage , ner , spacy , solved	1	375	November 20, 2020
Loading fasttext vectors to spacy/prodigy ner , spacy , solved	9	1544	February 13, 2022

Out-of-vocabulary new NER model

Related topics