Train dependency parser to detect sentences boundaries

Is it possible to train a DependencyParser to detect sentences boundaries only?
I read there is a new component called Sentencizer but it does not use a model, just fixed regex rules.

I think you probably don’t want the dependency parser only for sentence boundaries. The dependency parser tries to build a sequence of trees over the whole inputs, with connected trees then being marked as sentences. If you only have the sentence boundaries, the model won’t know how to connect all of the interior words, so most of the information will be unspecified. You could still train a model this way, but I think if all you want are the sentence boundaries and that’s the only training information you’re providing, the parser model probably isn’t the best choice.

Perhaps the entity recogniser would be good? Try to predict just the sentence-final punctuation marks, using a tag like U-END_SENTENCE.

Oh sounds good! @honnibal
I did not think about a NER model for punctuation marks!Awesome!
Then i will run it to set is_sent_start = True before “the real” ner model.


@honnibal could it work for \n too right?

Yes, that’s right.

1 Like