I want to train a new NER model, using my annotated data by following the example. If I’m using a pre-trained model, for example ‘en_core_web_lg’, how can I predict out-of-vocabulary entities? I can provide a lot of training data (thousands) with various animals (i.e cat
, dog
, etc.)
Isn’t this pre-trained model using word vectors and can potentially identify entities similar to the training ones? I’m a bit puzzled.