Does Prodigy support HTML annotation for NER

Annotating rendered HTML might sound appealing at first, but there's actually not really an easy answer for how the annotations should be resolved back to the underlying raw text and how to ensure that annotations are consistent. After all, what your model will get to see is the raw text.

I discuss some of these considerations in more detail on this thread:

One common solution is to write a function that takes raw HTML, strips out the markup, tokenizes the text and stores each token's character offset into the original raw HTML. This way, you can work with raw text without markup, while still being able to resolve the character offsets of your annotations back to the original input.