Hi! I have a few questions that are more about spaCy than Prodigy. I beg a pardon if they are irrelevant, but I just haven't found where else to ask (I left a question on Stack, but it remains unaswered, so I leave it here almost unchanged).
I need to annotate a corpus of international relations/policy articles, for which I use spaCy (with Prodigy atop). The default English models come with a set of pre-defined entity types, most of which are, in theory, easily applicable to my purposes. However, the only piece of documentation I found is just a table with very short descriptions, which does not answer the (quite numerous) questions I faced while working on the annotation.
So my primary question is: aren't there bigger accurate guides/documentation on all these entity types (e.g. NORP, GPE and so on), or at least an extensive set of examples? I simply fear that I might've been searching incorrectly all of this time.
I also suspect that the generalized guides might be considered rather dull by some, cause they would not address many very individual cases, but I also think that having them might make basics for the newcomers (like me) a lot easier.
And in case such documentation is nowhere to be found, I would appreciate if someone could help at least with the most important questions (I consider them too small to open separate topics for each, but I also might be wrong):
- In case where the name of something is followed by an abbreviation (see an example below), should it be considered one or two separate entities? What could define my choice?
Non-proliferation treaty (NPT), which contains the only binding commitment to nuclear disarmament in a multilateral treaty (...)
- Similarly, when one phrase implies two entities, but they are not 100% separated syntactically, how can I capture both entities correctly? Say, with the following example, which indicates two separate events:
concluding documents of the Madrid and Vienna conferences
- There are certain cases of ambiguity, e.g. 'Kyoto' may refer to the protocols just as well as to the town:
(...) undermines most points of the Kyoto.
- Finally, there's a question I think I found an answer to. I asked the following: "Is NORP only meant for tagging names of national/ethnic groups, or is it also used when an adjective indicates that another entity belongs to some nation/political/religious group? So in 'Iranian nuclear program', it is ok to tag 'Iranian' as NORP?" So, judging by this example, my assumption was true.