Newlines included in entity spans


In the documentation (Annotation interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP) it says:

" As of v1.9, tokens containing only newlines (or only newlines and whitespace) are unselectable by default, so you can’t include them in spans you highlight. To disable this behavior, you can set "allow_newline_highlight": true in your prodigy.json ."

The actual behaviour is not what I'd expect. I noticed in one of my projects (labelled by someone else) that there were a bunch of labelled entities with newlines and whitespace at the end. I did some testing with a very simple example of text with two words separated by a combination of spaces an a newline. In the ner.manual view, the newline is correctly greyed out, and I can't select it on its own. I can, however, select an adjacent word along with the whitespace/newline, resulting in an entity span that ends with whitespace.

My expectation for default behaviour is that selecting only whitespace, or an entity span beginning or ending with a whitespace token, should not be possible. Selecting an entity that spans a newline token should probably be possible, although it suggests poor formatting of the input text.

Hi Einar,

could you share the text example that you've used? I'd like to make sure I can reproduce what you experience locally but I wasn't on some texts that I generated.


{"text":"this example has \n lots of whitespace"}

and the command prodigy ner.manual test blank:en example.jsonl --label ENT with prodigy==1.11.10 and spacy==3.5.0. I am able to select "has \n " or " \n lots" as an entity.

In my initial setup, I've been using Prodigy v1.11.7 and spaCy v3.4.1.

When I run your example, this is what the interface looks like:

CleanShot 2023-02-03 at 11.23.48

I'm unable to select the newline and it seems like everything is working as expected.

I then tried again with Prodigy v1.11.10 and spaCy v3.5 and got the same results.

This is making me wonder if there's perhaps a global ~/.prodigy/prodigy.json file around that might have the "allow_newline_highlight": true setting. Could you varify that?

If not, is there something "special" about how you're making your selection? What browser/operating system are you using?

Aha! After reading your comment I tried with different browsers: I have the issue with both Chrome and Edge, but it works as expected in Firefox!

More details:
OS: macOS 12.6.2, M1 macbook pro
Edge: 108.0.1462.46
Edge: 69.0.3497.100
Firefox: 95.0

Ah yes. I'm now able to confirm. This is the view I see on Chrome:

On firefox it indeed is working as expected. That means we're dealing with a frontend bug here.

Will log this internally as an issue and get back to you when I know more.

Thanks for reporting!