I'm writing a module that will be able, from news abstracts, to determine the most probable location at which the described events are occurring. Examples: "Trade talks between China and USA were held in Singapore", "Singapore hosted trade talks between China and USA". The correct GPE in this case is Singapore.
I experimented with dependency parse in Spacy and found out that I could probably write very complicated rules based on dep_ properties of tokens and possibly parse three morphology. But it seems like a task for training a model, probably a seq2seq which would output GPE entities from the doc in the order of probability.
I'm not sure that Prodigy is the right tool to construct an end-to-end solution (annotating, model definition, training, ...), but if it is, how would I even start? Or would I just be better off using token data in the doc to feed in a custom model?
Thanks!