I'm having a lot of fun exploring prodigy for a text mining task. Training a NER model using annotations is extremely rewarding, and works surprisingly well with only a small amount of manual annotations.
So far, I've been using
prodigy train ner ... for assigning entities. Let's say I have two entity types,
TECH. Some data to illustrate my task:
[IBM][FIRM] has identified [hybrid cloud][TECH] as the growth area it will focus on. In early March 2017, [Snapchat][FIRM] officially went public. The company is one of the key players in [augmented reality][TECH] apps, and continues to invest in [computer vision][TECH] research. [Amazon][FIRM] offers the most diverse offerings in [cloud computing][TECH]. [Microsoft][FIRM] offers similar solutions, and has ...
Ideally, I would like to extract (across sentence boundaries, if possible, see example #2, and with multiple associations, see example #2 and #3), the following relationships:
[IBM] -> [hybrid cloud]
[Snapchat] -> [augmented reality] and
[Snapchat] -> [computer vision]
[Amazon] -> [cloud computing] and
[Microsoft] -> [cloud computing]
I've tried to use dependency parsing, but with many sentences having a complex structure I fail to capture a significant amount of relations. I've started annotating with
prodigy rel-manual. Relations are always in the direction
FIRM->TECH (with one-to-many both ways).
Is this approach theoretically feasible to extract the relations between
TECHentities, with annotations and training yielding better results than dependency parsing?
Is coreference better suited for this specific task (directionality of relation is not necessarily needed, since the hierarchy from
Since there is currently no
prodigy train nerequivalent for training a model with custom relations, is there currently a way to try out if this works?
Thank you very much for any pointers in advance, and I'm looking forward to explore prodigy further!