Does document.char_span require that you align with token boundaries?

For some values of n and m, document.char_span(n, m) returns None. Will it only return a Span object if the character offsets coincide with token boundaries? That seems to be what is happening, but I don’t see this mentioned in the documentation.

Yes, that’s correct. Doc.char_span returns None if the if the character indices don’t map to a valid span. I’ve just updated this in the documentation to make it more clear. Thanks!

In the future, I think a better place for reporting spaCy-only problems like this one is the spaCy issue tracker or for usage questions, StackOverflow. This way, more people will see it and if something is a bug, it’ll make it easier for us to track the changes on GitHub.

Posting spaCy-only questions on StackOverflow with the tag spacy. I already have a new one up there. :slight_smile: