Mapping relationships between named entities and unlabeled spans

Hi!

If you're looking at a challenge with overlapping spans, the spancat is definitely the way forward. How you want to predict the entities here depends a bit on the data though. While "knee injection" can be seen as two entities, "dental treatment" would be more awkward to split up. Then again, for sentences like the one around "buttocks" where the treatment and body part are not mentioned in a continuous span, you probably have to split them up into two entities anyway - there's no good solution otherwise.

Ultimately what it always boils down to is: what is the "easiest" way for a model to learn the information? If the entities are typically mentioned together and used as one phrase within the sentence, the model might find it easier to recognize them as one. A proxy for this, to determine what is "easiest", is by doing some of the annotation and trying out both schemes. Which of the two feels more natural and is easier to do? And which feels more intuitive as a human, interpreting language as we do? Chances are high that this will correlate to what is easier to do for a model, too (and thus, eventually, higher accuracy).

I might have strayed a bit from the original question - let me know if this helps or not!

1 Like