Ambiguity in NE tagging


I would like to raise possibility of erroneously tagging a partially-correct NE.

For eg. in an attempt to tag the NE “ILLNESS”, if in the below case study:

John suffered a left elbow injury because of a bad fall

“Left elbow injury” is a “ILLNESS” NE in this case. However, consider another example.

On the other hand, Tom did not suffer any left elbow injury because he did not fall.

For the above case, should we tag “left elbow injury” as the correct NE? Because even though the span of text signifies a possible NE, but taking into account the context, we don’t want our model to pick that up as an illness because it did not occur.

In all concerns, should we just simply accept it, or the context of how the span of text is used makes a significant impact?

How do I go about fixing such loops?


Ultimately this will depend on the data and what’s common, so it can be hard to guess which annotation policy will give you the best accuracy-for-effort ratio. I would say that generally, the model will have an easier time learning annotation policies where the same text is usually labelled the same way. So, if you can detect the negation in a separate step, you might find it better to always have the label ILLNESS apply to the phrase, and then have a text classification step over those sentences to detect whether it’s negated.

Here’s what that would look like in practical terms: you would first annotate all mentions of the illnesses, regardless of whether they’re negated. You’d then queue up a second annotation task to do negation detection on those sentences. You might find that a rule-based approach is sufficient for this, if there are only a few ways that the negation is expressed. I think a binary text classification step should be pretty easy to annotate and quite efficient, though.

If you like, you could do both phases of the annotation at once. You could have two entity types, for ILLNESS and NEG_ILLNESS. Then you could have a post-processing step which folds the labels together to make the NER dataset, and which also generates the text classification dataset.

The general thing to consider is that your annotation process can be different from how you divide the work for your machine learning pipeline, which can be different again from the way you actually output the labels for use in the rest of your application. You can perform rule-based transformations between all of these, because the presentation of the problem that’s most efficient to annotate is not necessarily the way that a model will find it easiest to learn.

Hi honnibal, two questions:

a) how would you annotate NEG_ILLNESS practically? more on multi or single-token base?

  1. On the other hand, Tom did not NEG_ILLNESS suffer any left elbow injury ILLNESS because he did not fall.
  1. On the other hand, Tom did not suffer any NEG_ILLNESS left elbow injury ILLNESS because he did not fall.

b) In the case there are more ILLNESS in one sentence, and only some of them are negated how would you proceed practically to link only the correct ones to negation? Example:

Tom had flu ILLNESS but, on the other hand, Tom did not NEG_ILLNESS suffer any left elbow injury ILLNESS because he did not fall.

by the way, prodigy is just the swiss-knife we needed to proof our isights and make them work in a reasonable time, thanks.

Well, I think it's important to consider the following questions separately:

  • What should the data look like during annotation?
  • What should the data look like when training the different models?
  • What should the data look like for evaluation?

For evaluation, you'll probably want something like:

You might want that during annotation as well -- that's one option. Another option is to have two annotation tasks, one where you have left elbow injury NEG_ILLNESS and another where you have negation predicted over a span of text.

The best way to set up the various representations is ultimately an empirical question --- it comes down to whatever works best. My general advice is to prefer text classification where possible, and try to factorise the decisions so that distinct pieces of information aren't all packed together into one label. You especially want to avoid labelling schemes where the decision depends on information in lots of different places, which is what happens when you have the negation case. In the negation case, you have the two decisions (is it an illness, is it negated), and the illness decision is made at the start of the phrase, while the negation decision depends on some word somewhere else. It's much better to have a lot of transforms to compose multiple models, if that lets you get an easier set of problems.

If you do decide to make the negation a text classification task, here's one way you can handle multiple illnesses. You could split the sentence up into multiple examples, so that you're predicting over spans of text that only have one illness. For instance, in:

You might split this into Tom had flu but, on the other hand, Tom did not suffer any and but, on the other hand, Tom did not suffer any left elbow injury because he did not fall. You could also use a windowed approach.

By the way, once you're dealing with only the negation case, you might find that a rule-based approach actually performs well. You could use the dependency parse to find which illness the negation word has a shorter path to, for instance. Even if there's 20 different ways people say the negation, that's really not that many rules to handle all the possibilities.

1 Like

@honnibal thank you for your response,

we will keep in mind your adivices and look for whatever fit best to our corpus and task.