How to deal with ambiguity in annotations

I am training a new entity FULL_NAME which would detect full names in texts, but I have a problem deciding what would be the correct way to handle names insides the addresses e.g. “Albert Einstein DD David Murray John Tower, Swindon, Wiltshire United Kingdom”, here I know that Albert Einstein is the person himself and David Murray is part of address and not a FULL_NAME in this context (according to my definition of full name).

  • What I am afraid of is that if the model will be able to learn the difference or the boundary of “it’s a full_name or not” here is too fuzzy"?
  • If I choose to accept David Murray as a FULL_NAME, with the idea that letter I could remove it with some rule-based approach, is it a problem that I have another entity ADDRESS which in this example would be:
    “DD David Murray John Tower, Swindon, Wiltshire United Kingdom”, and would overlap with the entity FULL_NAME?

I do think you’ll be making your task quite difficult if you distinguish names inside addresses like that. On the other hand, it’s true that you’ll get the overlap issue, so you wouldn’t be able to have the same model do the addresses and the names, if they overlap.

Have you considered annotating the elements of the address? So you would have subcategories like building, number, country etc. This might reduce the word-to-category ambiguity. The downside is it might take too much longer to annotate.

Hi @honnibal, thanks for the answer.

But again with this approach you would run into the same problem:

“DD David Murray John Tower, Swindon, Wiltshire United Kingdom”

is David Murray a person or a street name?

I think I might not understand the address myself, actually!

I thought David Murray was supposed to be the recipient? Or is it perhaps David Murray John Tower?

If David Murray is the recipient you’d mark it as PERSON. If it’s David Murray John Tower, you’d probability use the facility label, FAC.

I will try to explain it better this time:

Albert Einstein DD David Murray John Tower, Swindon, Wiltshire United Kingdom

How it should be processed is:

Albert Einstein —> PERSON
DD David Murray John Tower, Swindon, Wiltshire United Kingdom —> ADDRESS

Yes, this might be it…