Prodigy Questions

ryanwesslen · July 20, 2022, 2:33pm

Thanks for your feedback! This is an interesting case. I'm not familiar with tagtog but glad you mentioned so I can learn more!

Yes! I found these posts that are relevant:

It's possible to add nested entities or synonyms if you converted the .tsv file to a dictionary with the nested entities (you could also do the same for the synonyms) like this:

# dictionary of lowercase entities mapped to subtypes
DRUG_SUBTYPES = {
    'citalopram': ['ANTIDEPRESSANT', 'SOMETHING_ELSE'],
    'lexapro': ['ANTIDEPRESSANT'],
    # etc.
}

Then you would follow the instructions to use spaCy to create a custom component to your modeling pipeline. I've posted a quick example of what it may look like:

gist.github.com

https://gist.github.com/wesslen/25f8f694ce82934d74912f873785b7a1

pokemondict.tsv

1	Bulbasaur	Fushigidane
2	Ivysaur	Fushigisou
3	Venusaur	Fushigibana
4	Charmander	Hitokage
5	Charmeleon	Lizardo

spacy_synonym_subtype.py

# Assume we have an existing pattern matching rule-based entity (could also be a trained NER). This entity only identifies five different Pokemon characters as POKEMON.

from spacy.lang.en import English

nlp = English()
ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "POKEMON", "pattern": [{"LOWER": "bulbasaur"}]},
            {"label": "POKEMON", "pattern": [{"LOWER": "ivysaur"}]},
            {"label": "POKEMON", "pattern": [{"LOWER": "venusaur"}]},
            {"label": "POKEMON", "pattern": [{"LOWER": "charmander"}]},

This file has been truncated. show original

Two important points to think about. First, the model development isn't really Prodigy, but spaCy. Prodigy is the UI tool to get more annotations while spaCy is the NLP engine underneath. Prodigy does offer helpful training recipes but these are really running spaCy. To get the greatest/quickest gains with Prodigy, it's helpful to learn more about spaCy. Therefore, it seems like this question is really "can spaCy do this?" rather than "can Prodigy do this?".

It is worth noting that you can use Prodigy with other NLP/python libraries like TensorFlow or PyTorch, but that will require even more customization on the developer's part.

Related, what separates Prodigy from many other annotator tools is that it is a developer annotation tool. Prodigy is designed to be customized by your developers to write their own Python scripts to fit their unique needs (e.g., through custom recipes or custom interfaces). My favorite video that captures this design philosophy is this excellent talk titled "Let Them Write Code" by Ines:

Thanks again for your questions! Let us know if you have any further questions.

Topic		Replies	Views
annotating entities in text documents usage , ner , solved	15	9932	November 28, 2017
Annotating custom entities in job descriptions usage , custom , hr	9	1160	June 2, 2019
How to manage multiple annotators? usage , textcat	3	767	July 12, 2023
New to Prodigy: Annotation Structure Advice (Big Section of Text vs Separating Sentences) usage , ner , spancat	2	318	November 20, 2023
entity labeling usage , ner	3	1191	January 18, 2018

Prodigy Questions

Related topics