I’m starting to experiment around with Prodigy. First of all, I want to say that it is an outstanding piece of software, and it’s hard to believe that it comes from only two people. Great work!
Are there any general guidelines for how fine-grained the entity labels should be?
Is it better to have very generic labels that correspond to many types of entities (up to the extreme case where entities are simply tagged “entity” as in the core model of https://allenai.github.io/scispacy/)? Or is it better to have finer-grained labels that maybe correspond to entities that will appear in similar syntactic constructions? If so, what is an upper limit to how many different labels one should include?
And as a small side-question: is there anything that limits entities to concepts (as is the case in most Spacy models I’ve seen)? Could I, for instance, define a “relation indicator” entity that matches things like:
“Vitamin D deficiency is strongly associated with fatigue.”
“There are clear signs that narcolepsy can be caused by dysfunctions in GABA receptors”