We are using the Span Annotator to train a model that recognizes medical concepts contained in unstructured medical documentation.
Given that the model vectorizes each NER based on a multi-token context window ( 4 tokens on each side of NER - default setting), ....and assuming that we are using a very large training corpus,
.... do the resultant nearest neighbor vectors in the trained model possess some form of relatedness ?
For example, would vectors for NERs: heart attack
and myocardial infarction
( these are synonyms ) likely be found in proximity to each other using cosine similarity ?
Thanks very much
C