Hi, I've been using Prodigy for a while now and I've had pretty good results already. There is however a problem. Say for example I have a list of entities (something like ABC, DEF, GHI, JKL, etc. all tagged LETTERS). Now, my training dataset, while being quite big, doesn't mention ALL of these tags as they are too many. Therefore what happens is that the trained model recognizes ABC and DEF most of the times, but fails to recognize GHI and JKL since they never popped out in the dataset I annotated. My question is: would it be possible to add an underlying vocabulary containing all of my terms and their respective label? I already did something like that with a pattern in the very first step with ner.manual , but I'd like my model to recognize the entities in the dataset once trained.
The only other option left is to generate a fake dataset with all the entities we have but I hope there is a smarter way.