Ambiguity in NE tagging


I would like to raise possibility of erroneously tagging a partially-correct NE.

For eg. in an attempt to tag the NE “ILLNESS”, if in the below case study:

John suffered a left elbow injury because of a bad fall

“Left elbow injury” is a “ILLNESS” NE in this case. However, consider another example.

On the other hand, Tom did not suffer any left elbow injury because he did not fall.

For the above case, should we tag “left elbow injury” as the correct NE? Because even though the span of text signifies a possible NE, but taking into account the context, we don’t want our model to pick that up as an illness because it did not occur.

In all concerns, should we just simply accept it, or the context of how the span of text is used makes a significant impact?

How do I go about fixing such loops?


Ultimately this will depend on the data and what’s common, so it can be hard to guess which annotation policy will give you the best accuracy-for-effort ratio. I would say that generally, the model will have an easier time learning annotation policies where the same text is usually labelled the same way. So, if you can detect the negation in a separate step, you might find it better to always have the label ILLNESS apply to the phrase, and then have a text classification step over those sentences to detect whether it’s negated.

Here’s what that would look like in practical terms: you would first annotate all mentions of the illnesses, regardless of whether they’re negated. You’d then queue up a second annotation task to do negation detection on those sentences. You might find that a rule-based approach is sufficient for this, if there are only a few ways that the negation is expressed. I think a binary text classification step should be pretty easy to annotate and quite efficient, though.

If you like, you could do both phases of the annotation at once. You could have two entity types, for ILLNESS and NEG_ILLNESS. Then you could have a post-processing step which folds the labels together to make the NER dataset, and which also generates the text classification dataset.

The general thing to consider is that your annotation process can be different from how you divide the work for your machine learning pipeline, which can be different again from the way you actually output the labels for use in the rest of your application. You can perform rule-based transformations between all of these, because the presentation of the problem that’s most efficient to annotate is not necessarily the way that a model will find it easiest to learn.