Train dependency parser to detect sentences boundaries

damiano · May 2, 2019, 10:24am

Hello,
Is it possible to train a DependencyParser to detect sentences boundaries only?
I read there is a new component called Sentencizer but it does not use a model, just fixed regex rules.
Thanks

honnibal · May 3, 2019, 10:44am

I think you probably don’t want the dependency parser only for sentence boundaries. The dependency parser tries to build a sequence of trees over the whole inputs, with connected trees then being marked as sentences. If you only have the sentence boundaries, the model won’t know how to connect all of the interior words, so most of the information will be unspecified. You could still train a model this way, but I think if all you want are the sentence boundaries and that’s the only training information you’re providing, the parser model probably isn’t the best choice.

Perhaps the entity recogniser would be good? Try to predict just the sentence-final punctuation marks, using a tag like U-END_SENTENCE.

damiano · May 3, 2019, 11:50am

Oh sounds good! @honnibal
I did not think about a NER model for punctuation marks!Awesome!
Then i will run it to set is_sent_start = True before “the real” ner model.

Thanks!

damiano · May 3, 2019, 11:50am

@honnibal could it work for \n too right?

honnibal · May 3, 2019, 2:01pm

Yes, that’s right.

Topic		Replies	Views
Disable sentence boundary detection in Spacy Parser spacy	2	396	February 19, 2023
Advise on Dependency Training for Improving Sentence Breaking usage , spacy , dep	1	562	July 6, 2020
ValueError: [E030] Sentence boundaries unset. spacy	1	703	March 2, 2022
Prerequisites for the dep.teach recipe ner , spacy	4	750	January 18, 2019
Training dependency parser usage , ner , done , spacy	5	3880	March 11, 2018

Train dependency parser to detect sentences boundaries

Related topics