I'm assuming you're using spaCy here.
The dependency parser in the default models is trained to jointly predict sentence boundaries at the same time as it parses the rest of the sentence. The parser uses a transition-based formulation, which means that it works as a state machine, and the learning problem is to predict which action to make given the current state. There are actions to push and pop words from a stack and a queue, add arcs between words on the stack and the queue, and also to insert sentence breaks. A similar approach is described here: https://www.aclweb.org/anthology/P16-1181/
The details of the SBD parsing model aren't that relevant to understand, however. The way that I would recommend you improve the sentence breaks is probably to insert a component before the parser that sets some or all of the
token.is_sent_start attributes. This attribute takes a ternary value in
(None, True, False), where
None indicates the information is missing. The parser will respect previous
False designations, and come up with a parse structure that respects those boundaries (so no dependency arcs will cross a preset sentence boundary, and sentence breaks will not be inserted on words set to
There are also a number of sentence segmenters by third-parties in the spaCy universe. You could give those a try to see if they work well on your data.