Advice on training NER models with new entities

ines · January 4, 2019, 11:15pm

Yes, that's correct!

(One quick note on vectors: If you do end up with good vectors for your domain, using them in the base model can sometimes improve accuracy. If you're training a spaCy model and vectors are available in the model, they'll be used during training.)

Yes, that's correct. ner.make-gold can only pre-highlight entities that are predicted by the model, so this only works if the model already knows the entity type. If some of your entitiy types are already present in the model and others aren't, you could also combine the two recipes: start by annotating the existing labels with ner.make-gold, export the data, load it into ner.manual and add the new labels on top. How you do that depends on what's most efficient for your use case.

Training and evaluation examples should ideally be drawn from the same data source, yes. The examples should also be representative of what your model will see at runtime – for example, if you're processing short paragraphs at runtime, you also want to evaluate the model on short paragraphs (and not, say, short sentences only). Also double-check that there's no overlap between the training and evaluation examples – even single examples can often lead to pretty distorted results. 20-50% of the number of training examples is usually a good amount – if you have under a thousand examples for evaluation, you might have to take the evaluation results with a grain of salt.

Topic		Replies	Views
ner.train number of examples usage , ner	8	1948	August 3, 2018
NER - basic model doubt ner	13	380	February 22, 2024
Understanding ner.batch-train stats usage , ner , solved , best-practices	7	2707	October 26, 2018
ner.teach suggests spaces as entities? usage , ner , solved	13	1673	August 3, 2018
NER Training for Corporate Names ner , best-practices	22	11385	September 4, 2019

Advice on training NER models with new entities

Related topics