Add a whole bunch of entities via a vocabulary

qricambi · July 13, 2021, 7:45am

Hi, I've been using Prodigy for a while now and I've had pretty good results already. There is however a problem. Say for example I have a list of entities (something like ABC, DEF, GHI, JKL, etc. all tagged LETTERS). Now, my training dataset, while being quite big, doesn't mention ALL of these tags as they are too many. Therefore what happens is that the trained model recognizes ABC and DEF most of the times, but fails to recognize GHI and JKL since they never popped out in the dataset I annotated. My question is: would it be possible to add an underlying vocabulary containing all of my terms and their respective label? I already did something like that with a pattern in the very first step with ner.manual , but I'd like my model to recognize the entities in the dataset once trained.

The only other option left is to generate a fake dataset with all the entities we have but I hope there is a smarter way.

Thanks

adriane · July 13, 2021, 9:09am

Hi, if you have a full pattern list for the entities you always want to label, you can add an entity_ruler to your final pipeline to annotate them directly. If you're using prodigy v1.10, here are the corresponding spaCy v2 docs: https://v2.spacy.io/api/entityruler. The pattern format should be the same because prodigy and the entity ruler are both using spaCy's matchers underneath.

Neither entity_ruler or ner overwrite existing entities by default. Typically, people run the entity ruler first to be sure that all the known entities from the patterns are annotated for sure and then run ner to fill in the rest. The order of the components can affect the results a bit for the ner model and it's also possible to have the entity ruler overwrite entities if it's run second, so you'd have to try out the options and see what makes sense for your task.

qricambi · July 13, 2021, 3:16pm

Thank you, it looks like this is exactly what I was looking for.

Topic		Replies	Views
Add more 3 new entity type usage , ner	4	647	November 1, 2019
Improve trained models with annotations usage , ner , training	3	517	September 20, 2021
ner.teach to silver to gold -- how to best leverage Prodigy's recipes usage , ner	2	1291	August 19, 2019
Train NER model to improve existing entities spacy vs prodigy ner , spacy	1	951	December 9, 2019
How to perform automatically NER annotation based on patterns? usage , ner , spacy	1	617	June 2, 2021

Add a whole bunch of entities via a vocabulary

Related topics