distinquishing location entities using spaCy

Zim1-finest · November 20, 2021, 9:09am

Hello,

So i have a project that is driving me crazy and I am not sure if spaCy + Prodigy can help me here.

I would like to distinquish between a user's actual location and a remote location mention in tweets. That is "Having a coffee in Cup&Cino bar, Salzburg, Austria" would label Cup&Cino bar, Salzburg, Austria as the users actual location at the time of sending the post.

whilst the tweet: "Around this time two years ago I was touring Egypt" would label Egypt as a remote location (not the user's actual location at the time of sending the post)

I thought to do this task using a blank: en spacy model and two named entities: REMOTE & ACTUAL. My reasoning for this is that since spaCy uses context to distinquish say the name Paris from Paris the city then after a lot of examples it should be able to distinquish the two types of locations. However, I am not entirely sure if my logic is correct and would like some expect judgement before I inverst a lot of time in labelling.

I had also thought of using keyword phrases for example: 'i am at ...', 'staying in...', 'currently in...' to locate ACTUAL locations and using "travelling to...", 'Just left...', 'will be in..' to locate REMOTE locations but it seems pattern matching does not allow that.

What can be the most feasible approach here???

ljvmiranda921 · November 23, 2021, 4:27am

Hi @Zim1-finest , you have different options for this:

Perhaps a better approach is to do it in two stages: (1) label everything as a LOCATION first, then (2) do the "actual" vs. "remote" distinction in a separate step. The first stage may be an NER problem, whereas the second can be a text classification one.
Another signal that you can try is checking a Tweet's metadata, specifically the GeoObject from there you can check the location into which the tweet was sent. Although of course, we know nothing about the context a tweet was sent so it may not be 1-1. But that's another feature that you can try.

Zim1-finest · November 23, 2021, 1:22pm

Thank you @ljvmiranda921 ,

I already use the second method but unfortunately more than 95% of tweets do not have location in the metadata.

The first option you suggested would be very useful indeed. The only challenge that I had not mentioned earlier is that there are some tweets where both ACTUAL and REMOTE locations are present for example:

Its great to have my friend visting me here in Salzburg after a 10 hour trip from Amsterdam.

ACTUAL Localtion: Salzburg
REMOTE location: Amsterdam

ljvmiranda921 · November 24, 2021, 12:53am

Hi @Zim1-finest ,

The only challenge that I had not mentioned earlier is that there are some tweets where both ACTUAL and REMOTE locations are present for example:

Perhaps you can frame it as a multilabel text categorization problem?

Zim1-finest · November 24, 2021, 7:03am

Oh yes, had not thought about that.

It would certainly be helpful.
Thank you.

Topic		Replies	Views
Multi-phrased labels for ner.teach usage , ner	3	949	July 6, 2018
spaCy, prodigy, annotation usage , ner , solved	2	646	February 8, 2019
Best Approach for My Project ner , spacy , project , best-practices	3	575	March 10, 2022
Spacy NER Training, How to proceed name placeholders in a text ner , spacy	1	410	January 21, 2021
Determining most salient geographic entities in news text ner , spacy	1	464	January 2, 2020

distinquishing location entities using spaCy

Related Topics