Hello! I'm launching a prodigy session with the spans.manual mode with a loadable custom spacy pipeline. This pipeline contains a tokenizer based on the splitting the text by newline characters where a newline character ('/n') is its own token and any trailing or leading whitespace following the newline character is its own token. For example after the pipeline runs on a piece of text the resulting doc tokens would be: ['Result', 'Comment', '\n', '"5 spaces here"', 'Name,'], which would have a newline after 'Result Comment' and the next line should be: '"5 spaces here" Name'. But when the processed document is loaded up in the Prodigy UI using the spans.manual recipe, the leading whitespace in front of Name on the 2nd line doesn't exist. How can I have the leading whitespace tokens shown in the Prodigy UI? Note: I just typed "5 spaces here" to represent the whitespace token of 5 spaces because it wasn't displaying whitespace on here.
Thank you.
Have you tried modifying honor_token_whitespace
in the config?
See this link. By default it's true
- perhaps you could modify it in your config?
Hi @ryanwesslen,
I set honor_token_whitespace
to true, but it didn't load the white space tokens still. For reference, most of the whitespace tokens for the document I'm loading up are right after newlines. Is there any other setting or any other possible reason for why when launching prodigy with the processed document, they aren't appearing?
@ryanwesslen For additional reference, here's another portion of a list of tokens which I checked is outputted by calling the custom spacy pipeline I have.
This is what is shown in the Prodigy view for this part of the document: