This is my second question for the day. Please pardon me.
I have input text that is already in one sentence per line format. I would like to disable the parser's sentence boundary detection ( which is the subject of a sister topic) and use the Spacy Sentencizer to create sentences from newlines.
As per the documentation, the Sentencizer takes a list of punctuation chars ( punct_chars
) that can be used as sentence boundary. Hence, I am setting punct_chars
= [ '\n' ] and it seems to work in the simple tests I have performed.
What I have not been able to figure out is the purpose of the overwrite
flag; I have left it to default False but when should this flag be set to True ?
Also, do I need to worry about the scorer
? I am going with the default value. I would appreciate any explanation ( or a pointer to relevant documentation ) of the purpose of the scorer
in the Sentencizer.
I would greatly appreciate any insights.