Determining most salient geographic entities in news text

Virostatiq · December 26, 2019, 10:55am

I'm writing a module that will be able, from news abstracts, to determine the most probable location at which the described events are occurring. Examples: "Trade talks between China and USA were held in Singapore", "Singapore hosted trade talks between China and USA". The correct GPE in this case is Singapore.

I experimented with dependency parse in Spacy and found out that I could probably write very complicated rules based on dep_ properties of tokens and possibly parse three morphology. But it seems like a task for training a model, probably a seq2seq which would output GPE entities from the doc in the order of probability.

I'm not sure that Prodigy is the right tool to construct an end-to-end solution (annotating, model definition, training, ...), but if it is, how would I even start? Or would I just be better off using token data in the doc to feed in a custom model?

Thanks!

honnibal · January 2, 2020, 1:00pm

I can see how an ML solution will be helpful here, but is it necessary to do it sequence-to-sequence? You could have a model that made one binary prediction per GPE entity, using features from a transformer encoder or BiLSTM. You'd probably want to write the custom model in PyTorch, but it should be easy to do the annotations in Prodigy.

I do think having the standard GPE detection as a preprocess will be useful to you, but you can try doing it directly, without that step as well.

Topic		Replies	Views
Interested in subset of default ORG/PRODUCT/LOCATION entities usage , ner	1	345	January 18, 2022
spaCy, prodigy, annotation usage , ner , solved	2	721	February 8, 2019
Can we train an NER to recognise some entities not learned from labelled examples, but a list of imported entities, such as names of areas, main roads, etc.? usage , ner , spacy , solved	2	575	June 21, 2020
Improve trained models with annotations usage , ner , training	3	517	September 20, 2021
project NER help usage , ner , best-practices	2	659	December 3, 2020

Determining most salient geographic entities in news text

Related topics