Disable sentence boundary detection in Spacy Parser

nlpfan · November 6, 2022, 11:40pm

My text input to Spacy is already in one sentence per line format. So I would like to switch off the sentence boundary detection in the parser.

Is there any config setting that controls the sentence boundary detection by the parser ? If not, is there a work around I can employ to let the parser assign dependency tags but not do sentence boundary detection ?

I would like to take advantage of the dependency tags generated by the parser so I believe excluding the parser from my pipeline is not the way to go.

Thanks!

koaning · November 7, 2022, 2:05pm

I think there are two options here.

You could set up your own custom model, maybe using pySBD, and save that to disk. You can refer to this new saved model in your ner recipes.
You could write a custom recipe that takes care of the sentences in the loop. It might use something like:

import srsly 

examples = srsly.read_jsonl("path/to/file.jsonl")

def sentence_stream(example):
    # Use your own split_sentence implementation here 
    for sentence in split_sentence(example['text']):
        yield {"text": sentence} 

stream = (sentence_stream(ex) for ex in examples)

Let me know if this doesn't work or if I'm misinterpreting your problem.

nlpfan · February 19, 2023, 4:16pm

I tried the first approach and it worked as expected. The text that I am working with is not well formed ( more like a bunch of sentence fragments, like text extracted from cells of a table) so dependency tags are not that useful.

So I am now using the balnk model ( blank:en) with ner.manual recipe and very satisfied with the results.

Thanks much for your help @koaning

Topic		Replies	Views
Custom sentence boundaries detection usage , spacy	10	1674	June 27, 2019
Train dependency parser to detect sentences boundaries usage , spacy , solved , dep	4	820	May 3, 2019
ValueError: [E030] Sentence boundaries unset. spacy	1	703	March 2, 2022
Advise on Dependency Training for Improving Sentence Breaking usage , spacy , dep	1	562	July 6, 2020
ignore strings for dependency parser spacy , solved , dep	3	686	May 9, 2018

Disable sentence boundary detection in Spacy Parser

Related topics