This is my second question for the day. Please pardon me.
I have input text that is already in one sentence per line format. I would like to disable the parser's sentence boundary detection ( which is the subject of a sister topic) and use the Spacy Sentencizer to create sentences from newlines.
As per the documentation, the Sentencizer takes a list of punctuation chars (
punct_chars) that can be used as sentence boundary. Hence, I am setting
punct_chars = [ '\n' ] and it seems to work in the simple tests I have performed.
What I have not been able to figure out is the purpose of the
overwrite flag; I have left it to default False but when should this flag be set to True ?
Also, do I need to worry about the
scorer ? I am going with the default value. I would appreciate any explanation ( or a pointer to relevant documentation ) of the purpose of the
scorer in the Sentencizer.
I would greatly appreciate any insights.