How to deal with ambiguity in annotations

EFgit · January 21, 2019, 11:19am

I am training a new entity FULL_NAME which would detect full names in texts, but I have a problem deciding what would be the correct way to handle names insides the addresses e.g. “Albert Einstein DD David Murray John Tower, Swindon, Wiltshire United Kingdom”, here I know that Albert Einstein is the person himself and David Murray is part of address and not a FULL_NAME in this context (according to my definition of full name).

What I am afraid of is that if the model will be able to learn the difference or the boundary of “it’s a full_name or not” here is too fuzzy"?
If I choose to accept David Murray as a FULL_NAME, with the idea that letter I could remove it with some rule-based approach, is it a problem that I have another entity ADDRESS which in this example would be:
“DD David Murray John Tower, Swindon, Wiltshire United Kingdom”, and would overlap with the entity FULL_NAME?

honnibal · January 24, 2019, 1:09am

I do think you’ll be making your task quite difficult if you distinguish names inside addresses like that. On the other hand, it’s true that you’ll get the overlap issue, so you wouldn’t be able to have the same model do the addresses and the names, if they overlap.

Have you considered annotating the elements of the address? So you would have subcategories like building, number, country etc. This might reduce the word-to-category ambiguity. The downside is it might take too much longer to annotate.

EFgit · January 24, 2019, 1:17pm

Hi @honnibal, thanks for the answer.

But again with this approach you would run into the same problem:

"DD David Murray John Tower, Swindon, Wiltshire United Kingdom"

is David Murray a person or a street name?

honnibal · January 25, 2019, 9:54pm

I think I might not understand the address myself, actually!

I thought David Murray was supposed to be the recipient? Or is it perhaps David Murray John Tower?

If David Murray is the recipient you’d mark it as PERSON. If it’s David Murray John Tower, you’d probability use the facility label, FAC.

EFgit · January 29, 2019, 9:01am

I will try to explain it better this time:

Albert Einstein DD David Murray John Tower, Swindon, Wiltshire United Kingdom

How it should be processed is:

Albert Einstein ---> PERSON
DD David Murray John Tower, Swindon, Wiltshire United Kingdom ---> ADDRESS

Yes, this might be it...

Topic		Replies	Views
Annotating / training against inconsistent PERSON entities ner	3	780	July 12, 2018
Overlapping NER usage , ner , spacy	2	337	July 1, 2021
Annotating / training against PERSON and ORG entities usage , ner , solved	6	430	September 30, 2020
Best strategies to annotate long entities.	1	341	November 22, 2022
Address entity recognition from a resume/CV ner , best-practices	2	2399	January 18, 2019

How to deal with ambiguity in annotations

Related topics