I'm having a lot of fun exploring prodigy for a text mining task. Training a NER model using annotations is extremely rewarding, and works surprisingly well with only a small amount of manual annotations.
So far, I've been using prodigy train ner ...
for assigning entities. Let's say I have two entity types, FIRM
and TECH
. Some data to illustrate my task:
[IBM][FIRM] has identified [hybrid cloud][TECH] as the growth area it will focus on.
In early March 2017, [Snapchat][FIRM] officially went public. The company is one of the key players in [augmented reality][TECH] apps, and continues to invest in [computer vision][TECH] research.
[Amazon][FIRM] offers the most diverse offerings in [cloud computing][TECH]. [Microsoft][FIRM] offers similar solutions, and has ...
Ideally, I would like to extract (across sentence boundaries, if possible, see example #2, and with multiple associations, see example #2 and #3), the following relationships:
Simple relation:
[IBM] -> [hybrid cloud]
One-to-many FIRM
[Snapchat] -> [augmented reality]
and [Snapchat] -> [computer vision]
One-to-many TECH
[Amazon] -> [cloud computing]
and [Microsoft] -> [cloud computing]
I've tried to use dependency parsing, but with many sentences having a complex structure I fail to capture a significant amount of relations. I've started annotating with prodigy rel-manual
. Relations are always in the direction FIRM->TECH
(with one-to-many both ways).
My questions:
-
Is this approach theoretically feasible to extract the relations between
FIRM
andTECH
entities, with annotations and training yielding better results than dependency parsing? -
Is coreference better suited for this specific task (directionality of relation is not necessarily needed, since the hierarchy from
FIRM->TECH
is clear)? -
Since there is currently no
prodigy train ner
equivalent for training a model with custom relations, is there currently a way to try out if this works?
Thank you very much for any pointers in advance, and I'm looking forward to explore prodigy further!