Newlines with span highlighting

Thanks, this is a good question and an interesting problem.

I just tested it with a simple string like "hello\nworld" and Prodigy’s default behaviour is definitely bad – it swallows the newline and, if it’s not surrounded by spaces, renders it visually as a space. This is due to the browser’s default rendering of newlines within an element’s text content. I think setting the text container to white-space: pre-wrap (render all whitespace and wrap lines) would be a more reasonable default. I can’t think of any negative side effects of this either.

Prodigy’s philosophy is to always show the user pretty much the exact input and not secretly swallow or hide any information that may have an influence on the model’s predictions. That’s also why HTML is never rendered by default, unless you explicitly tell Prodigy to do so. So I’d even consider the newline behaviour a bug. The good news is, it’ll be easy to fix in one line of CSS :tada: I’ll make the fix and it’ll be included in the next release.

This will probably solve your problem, so you won’t need any custom HTML. If you do want to work around this in the meantime, it’s definitely possible – but the solution is not particularly satisfying. The easiest way would be to programmatically generate the HTML from the slices of the text defined via the span’s start and end index:

text = 'A very long text with several sentences\n and a newline'
start = 15
end = 37

html = '{prefix}<mark>{sent}</mark>{suffix}'
       .format(prefix=text[:start], sent=text[start:end], suffix=text[end:])

task = {'text': text, 'spans': [{'start': start, 'end': end}], 
        # replace newline last to now throw off span indices
        'html': html.replace('\n', '<br/>')} 

If you want the highlighted span to look nicer, you can use the --bg-highlight variable to set Prodigy’s default highlight background colour, and add some spacing on the sides:

<mark style="background: var(--bg-highlight); padding: 0 0.25em;">{sent}</mark>
1 Like