I'm creating several new custom entities based on a corpus with approximately 20K sentences. The corpus has a high percentage of several of the entities (PERSON, ORG, etc.) recognized by en_core_web_lg, but there are a number of entities that are missing or poorly represented such as NORP, FAC, PRODUCT, EVENT, WORK_OF_ART. Prodigy NER training starting with the en_core_web_lg model works great on my new entities and the well represented ones in my corpus. Although my corpus does not include several pretrained entities now, they may appear in future as yet unseen sentences. I would like to retain spaCy's ability to find them.
How do I get examples of these entities so that the spaCy doesn't forget about them? Is there a dataset that I can add to my corpus to cover the missing entities? Is there another approach not covered by the Annotation Flowchart that I should try?