distinquishing location entities using spaCy

Hello,

So i have a project that is driving me crazy and I am not sure if spaCy + Prodigy can help me here.

I would like to distinquish between a user's actual location and a remote location mention in tweets. That is "Having a coffee in Cup&Cino bar, Salzburg, Austria" would label Cup&Cino bar, Salzburg, Austria as the users actual location at the time of sending the post.

whilst the tweet: "Around this time two years ago I was touring Egypt" would label Egypt as a remote location (not the user's actual location at the time of sending the post)

I thought to do this task using a blank: en spacy model and two named entities: REMOTE & ACTUAL. My reasoning for this is that since spaCy uses context to distinquish say the name Paris from Paris the city then after a lot of examples it should be able to distinquish the two types of locations. However, I am not entirely sure if my logic is correct and would like some expect judgement before I inverst a lot of time in labelling.

I had also thought of using keyword phrases for example: 'i am at ...', 'staying in...', 'currently in...' to locate ACTUAL locations and using "travelling to...", 'Just left...', 'will be in..' to locate REMOTE locations but it seems pattern matching does not allow that.

What can be the most feasible approach here???

Hi @Zim1-finest , you have different options for this:

  1. Perhaps a better approach is to do it in two stages: (1) label everything as a LOCATION first, then (2) do the "actual" vs. "remote" distinction in a separate step. The first stage may be an NER problem, whereas the second can be a text classification one.
  2. Another signal that you can try is checking a Tweet's metadata, specifically the GeoObject from there you can check the location into which the tweet was sent. Although of course, we know nothing about the context a tweet was sent so it may not be 1-1. But that's another feature that you can try.

Thank you @ljvmiranda921 ,

I already use the second method but unfortunately more than 95% of tweets do not have location in the metadata.

The first option you suggested would be very useful indeed. The only challenge that I had not mentioned earlier is that there are some tweets where both ACTUAL and REMOTE locations are present for example:

Its great to have my friend visting me here in Salzburg after a 10 hour trip from Amsterdam.

ACTUAL Localtion: Salzburg
REMOTE location: Amsterdam

Hi @Zim1-finest ,

The only challenge that I had not mentioned earlier is that there are some tweets where both ACTUAL and REMOTE locations are present for example:

Perhaps you can frame it as a multilabel text categorization problem?

Oh yes, had not thought about that.

It would certainly be helpful.
Thank you.