Adding new named entities to existing model

Hi, I'm having issues when adding a new entity type to previously trained model. Let's say I have an entity "A" that I want to add to the en_core_web_lg model. This entity is completely new, comprising of words that the model very rarely recognizes in any of its own entity categories. I created annotations for this entity using ner.manual and then used the following command to train and update the base model:

prodigy train A-model --ner annotation_dataset -m en_core_web_lg -L -V

The newly trained model "A-model" does not seem to remember any of the previous en_core_web_lg entities when evaluated using the print stream recipe and also when I verified it using spacy.

My ultimate goal is to add one more NER category to this model (so two in total, A plus a new one, B, added to en_core_web_lg) while not losing any of the native categories. I was planning on doing this iteratively, ie training A first then building upon the A-model with the B entity to create an AB-model.

What am I doing wrong? Do I need to just use the en_core_web_lg to pre annotate and then add my own entities and train a model from scratch?

Hi @Pragma_Tom, welcome to Prodigy!

This is often the case of "catastrophic forgetting" and becomes apparent when some new entities are added to an existing model.

Yes. The best practice is to do "pseudo-rehearsal" i.e., to use the original model to label examples and mix them through your fine-tuning updates. As for other strategies, you can check the following threads:

We also published a blogpost on pseudo-rehearsal, and explains how it solves catastrophic forgetting.

LJ, Thank you for the welcome, the quick response, and the references! Much appreciated.