Domain Specific Dictionary Files

sh123 · May 28, 2021, 10:07am

Hi, i am new to prodigy and want to build NER model for italian languge medical data. I am using flair embeddings and I have several domain spesific dictionary files (person names, city names, etc) . I am confuse how to add these dictionary data into NER models. Any suggestions would be very helpful

Thanks!

ines · May 31, 2021, 2:39am

Hi! I think the most straightforward option would be to stream in your raw text, match all the entries in your dictionary in the text if they occur, and then correct/update them manually to create your final training data. So every time a person name from your dictionary is found, you add a span for PERSON to the example. Here's an example that shows how you can stream in predictions from a custom model – but instead of a custom model, this could also just be your dictionary lookup: https://prodi.gy/docs/named-entity-recognition#custom-model

Prodigy's built-in recipes let you use spaCy's Matcher to pre-label examples based on token-based rules and dictionaries: https://prodi.gy/docs/named-entity-recognition#manual-patterns This can be a bit more flexible than just dictionary lookups, because you'll be able to describe tokens and their attributes and do stuff like "any number plus case-insensitive 'january', 'february', ...".

It's typically a good idea to view your dictionary matches during annotation and correct them, so you can get a feeling for what's missing, and correct and mistakes. Those are often the ones that are especially interesting: misspellings, new entities that aren't in your dictionary yet and of course ambiguous entities where they context matters ("apple" vs. "apple") and which is where NER makes the most difference. If there are spans that you know will always be a given entity, you can always add rules on top to boost your accuracy (for example, "Apple, Inc." will always be an ORG): https://spacy.io/usage/rule-based-matching#entityruler

Topic		Replies	Views
ner.teach to silver to gold -- how to best leverage Prodigy's recipes usage , ner	2	1291	August 19, 2019
Add a whole bunch of entities via a vocabulary usage , ner , spacy	2	379	July 13, 2021
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018
spaCy, prodigy, annotation usage , ner , solved	2	721	February 8, 2019
Domain-specific NER project usage , ner , medical	1	1793	July 8, 2019

Domain Specific Dictionary Files

Related topics