ner.batch-train with new data on existing model

BLP · May 19, 2019, 2:41pm

Hi,

I have trained the model “de_core_news_sm” with two entities (ORG and MONEY) in the dataset “ds_1” and as a result I got a new model “de_core_news_sm_blp”. Now I have more data and I want to improve “de_core_news_sm_blp”. What is the best way to do so?

My process:

ner.teach with the new data stored in “ds_1” and using the “de_core_news_sm_blp” model
ner.batch-train using the new expanded dataset “ds_1” and again from scratch using “de_core_news_sm”

What would be an alternative or better approach?

Thanks!

ines · May 20, 2019, 10:09am

Hi! Your workflow sounds good, I think that’s pretty much exactly what I would have suggested Training from the same “base model” is definitely good, because it’ll let you avoid random side-effects from making lots of small updates to the same weights.

If you want to be extra safe, you could consider using a different dataset for the new annotations and then merging the two sets once you’re ready to train. It’s always easy to merge two datasets into one later, but it’s more annoying to separate a single dataset and remove examples if you’ve made a mistake.

Topic		Replies	Views
Work Flow for extending an NER model with new entity types ner , best-practices	1	1326	June 1, 2019
Best practice for merging multiple NER datasets into one . usage , ner	1	654	November 30, 2021
Best practices for NER annotation to avoid overfitting usage , ner	3	1241	October 21, 2020
Best strategy for training an NER engine usage , ner	8	2124	December 27, 2017
Workflow for training NER on multiple entities usage , ner , solved , best-practices	1	1242	July 3, 2018

ner.batch-train with new data on existing model

Related Topics