Does document.char_span require that you align with token boundaries?

wpm · February 6, 2018, 11:19pm

For some values of n and m, document.char_span(n, m) returns None. Will it only return a Span object if the character offsets coincide with token boundaries? That seems to be what is happening, but I don’t see this mentioned in the documentation.

ines · February 7, 2018, 12:07am

Yes, that’s correct. Doc.char_span returns None if the if the character indices don’t map to a valid span. I’ve just updated this in the documentation to make it more clear. Thanks!

In the future, I think a better place for reporting spaCy-only problems like this one is the spaCy issue tracker or for usage questions, StackOverflow. This way, more people will see it and if something is a bug, it’ll make it easier for us to track the changes on GitHub.

wpm · February 7, 2018, 4:10pm

Posting spaCy-only questions on StackOverflow with the tag spacy. I already have a new one up there.

Topic		Replies	Views
Providing NER token spans only (no character offsets) usage , spacy , best-practices	2	1890	August 12, 2019
Boundaries (token/offsets) on Ner annotations ner , database , solved	1	538	October 16, 2019
Span out of index Error usage , spacy , off-topic	1	863	February 4, 2021
Span of annotation is not correct in the browser when trying to re-annotate usage , ner , done , solved	2	606	March 22, 2019
Token indices in NER jsonl format usage , ner , solved	1	536	May 20, 2019

Does document.char_span require that you align with token boundaries?

Related topics