nix411
(Nicolai Bjerre Pedersen)
May 15, 2019, 8:57pm
1
I have a task where I want to annotate spans in HTML reports. Is that possible? So I am looking for ner.manual
but where I don’t have plain text (no text
field) but instead I have html.
It is the highlighting feature I need.
ines
(Ines Montani)
May 16, 2019, 10:38am
2
Hi! You might want to check out the following threads where I explain the considerations around this (and possible approaches) in more detail:
Hi! I think the problem here is that "html" input only works with the "html" annotation interface. If you’re annotating data with ner.manual, you’ll be selecting and labelling tokens – and that really only works on raw text. That’s why the recipe will only use the "text" key that’s present in your data.
If you pass in "<strong>hello</strong>", there’s no clear solution for how this should be handled. How should it be tokenized, and what are you really labelling here? The underlying markup or ju…
Annotating rendered HTML might sound appealing at first, but there’s actually not really an easy answer for how the annotations should be resolved back to the underlying raw text and how to ensure that annotations are consistent. After all, what your model will get to see is the raw text.
I discuss some of these considerations in more detail on this thread :
One common solution is to write a function that takes raw HTML, strips out the markup, tokenizes the text and stores each token’s charact…