You can try and include that, or strip out all HTML except for some tags you care about, so your headlines end up looking like
h4 Some subheadline if you want to reflect the headline weight in your input data – however, I'm really not sure how useful this will be, especially not for NER. The model has a pretty narrow context window on either side, so it mostly won't be able to take the signal from those tags into account when making its predictions.
It sounds like what you're really looking for is a way to include information about the formatting as features in your model. But this needs some experimentation and a custom implementation – and you probably want to strip this information out of the text and attach it to the tokens, instead of including the markup directly.