Ultimately this will depend on the data and what’s common, so it can be hard to guess which annotation policy will give you the best accuracy-for-effort ratio. I would say that generally, the model will have an easier time learning annotation policies where the same text is usually labelled the same way. So, if you can detect the negation in a separate step, you might find it better to always have the label
ILLNESS apply to the phrase, and then have a text classification step over those sentences to detect whether it’s negated.
Here’s what that would look like in practical terms: you would first annotate all mentions of the illnesses, regardless of whether they’re negated. You’d then queue up a second annotation task to do negation detection on those sentences. You might find that a rule-based approach is sufficient for this, if there are only a few ways that the negation is expressed. I think a binary text classification step should be pretty easy to annotate and quite efficient, though.
If you like, you could do both phases of the annotation at once. You could have two entity types, for
NEG_ILLNESS. Then you could have a post-processing step which folds the labels together to make the NER dataset, and which also generates the text classification dataset.
The general thing to consider is that your annotation process can be different from how you divide the work for your machine learning pipeline, which can be different again from the way you actually output the labels for use in the rest of your application. You can perform rule-based transformations between all of these, because the presentation of the problem that’s most efficient to annotate is not necessarily the way that a model will find it easiest to learn.