I've created an NER model that seems to be working very well. Now I would like to do the relations between the entities.
I'm curious on a few things that I'm ignorant on and can't seem to find an answer that suffices me.
When I'm done annotating the relations can I train a model using "prodigy train parser my_rel_dataset....."? Or should that strictly be used for the dependency parsing? I trained one relation model doing this and it seemed to work quite well, however I'm questioning myself if this is the right way to do it.
Many of my relations are between multiple people and one action, like "Jerry, Jim, and Pam competed in the race" Where I want to link each person to "race". Should I do a many to one relation or a one to many relation? Does it matter if a child has multiple heads or if a head has multiple children?
When annotating, I have a lot of named entities but I only really care about the relationship between two of them. Is it more beneficial to annotate all relations for training purposes or just focus on the one I care about?
Can you display relations with displaCy? I tried to do this once after a test and it only showed one of my labels linking back in on itself for each word, I assumed I just messed up somewhere.
Hi! The relations workflow in Prodigy is still very new and it's important to note that spaCy currently doesn't have a built-in out-of-the-box model implementation for predicting relations between entities – so you'd have to bring your own. Some related threads:
Yes, the parser is designed specifically for dependency parsing, so using it in this way wouldn't really work. The task of dependency parsing and the assumptions a parser makes are quite different from what you're going for.
There's no definitive answer for that and it will depend on the model you're using and what exactly you want to predict.
Since the relation prediction task would happen in a separate step, I think it's fine to just focus on the entities you're interested in and only run your relation extraction logic if an entity type PERSON is found, etc.
The built-in dep visualizer can visualize Doc objects using their token.head and token.dep_ values. So if those are set (which you can also do manually), you'll be able to visualize it. If you want to visualize relations between spans like entities, one easy option would be to merge them into one token (using Doc.retokenize) beforehand.
(If you tried to render a visualization and all tokens were attached to themselves, this usually indicates that the heads weren't set and the default values were used.)