Pre-trained model vs training a model from scratch?

honnibal · June 27, 2018, 9:04am

Let's say you want to split GPE into COUNTRY, CITY, MISC. You can at least pre-tag a bunch of text with the initial model, and only annotate the examples it's labelled as GPE, initially. You could do this in the textcat interface. You probably also want to group the examples, so that you only have to annotate "America" once. If some of your phrases are ambiguous, you could flag them. Alternatively, if you do want to annotate ever instance rather than every type, it'll be efficient to order the queue so that you do all the "America" instances at once. This way you can click through quickly.

Of course, you can still have countries or cities which the model didn't initially tag as a GPE. But doing this first step of correction will give you a lot of examples quickly, so you can get the initial model trained. Once it's working, you can either use the ner.teach or ner.make-gold interface to fill in the missing entries.

If we're only interested in entity types that aren't in the initial model, there's not much to gain from resuming training. It's probably going to hurt more than it helps.

We can still start teaching the model entity types it's not trained with, using the ner.teach interface. But to do that, you need to specify a patterns file. The patterns file will be used to start suggesting some entities of the new type. Once you've accepted some of these suggestions, they're used as examples for the model, so it can start making the suggestions.

Topic		Replies	Views
how to use ner.correct --update usage , ner , solved	4	683	October 21, 2021
Training few new entities: Result very low usage , ner , spacy	3	17	January 29, 2025
Training, pretraining best practices and deeper understanding usage , best-practices	3	960	October 24, 2019
Should I be using --base-model when training my model? ner , training	8	2040	May 27, 2022
Transfer Learning for NER usage , ner	6	2506	May 24, 2021

Pre-trained model vs training a model from scratch?

Related topics