I am training a new entity FULL_NAME which would detect full names in texts, but I have a problem deciding what would be the correct way to handle names insides the addresses e.g. “Albert Einstein DD David Murray John Tower, Swindon, Wiltshire United Kingdom”, here I know that Albert Einstein is the person himself and David Murray is part of address and not a FULL_NAME in this context (according to my definition of full name).
What I am afraid of is that if the model will be able to learn the difference or the boundary of “it’s a full_name or not” here is too fuzzy"?
If I choose to accept David Murray as a FULL_NAME, with the idea that letter I could remove it with some rule-based approach, is it a problem that I have another entity ADDRESS which in this example would be:
“DD David Murray John Tower, Swindon, Wiltshire United Kingdom”, and would overlap with the entity FULL_NAME?
I do think you’ll be making your task quite difficult if you distinguish names inside addresses like that. On the other hand, it’s true that you’ll get the overlap issue, so you wouldn’t be able to have the same model do the addresses and the names, if they overlap.
Have you considered annotating the elements of the address? So you would have subcategories like building, number, country etc. This might reduce the word-to-category ambiguity. The downside is it might take too much longer to annotate.