hi @rory-hurley-gds!
Thanks for the background and details on the experiment.
I also noticed you posted on spaCy too.
I would agree with @rmitsch's point:
we strongly recommend training your own NER model from scratch if your target labels change. One reason for that is fine-tuning may lead to catastrophic forgetting of previously learned labels. It's possible to use spaCy's rehearsal functionality to improve model stability, but retraining from scratch is the go-to approach for this as of now.
Have you seen our NER
flowchart? We tried to review a similar circumstance of considering training new entity types. A few months ago, we updated the flowchart with relevant posts or documentation.
For example, in that flowchart we recommend if you're trying to add more than three new entities, you're better off to train from scratch.
As Raphael mentioned, you may be dealing with catastrophic forgetting.
I like that post because it has several ideas of ways you can overcome.
If you do want to fine-tune, I'd recommend this post from Matt:
https://support.prodi.gy/t/work-flow-for-extending-an-ner
Hope this helps!