what is best way to to extract paragraph or long sentences in a text document?

Hello, I am doing Information extraction task to extract 5 different entities. Out of 5, 4 are real entities and 5th one is long text identification. What is the best way to do using Prodigy and spaCy?. I am trying usual prodigy and spaCy ner way for the first 4 entities where i am progressing slowly. Now the 5th one is not actually an entity. Its a para or long sentences extraction. I can give a simple example. articles info come from different sites so the format is not consistent to use rule-based extraction.

The word abstract before abstract starts is not always present. Otherwise i would have taken every sentence after the word abstract. Also, sometimes journal informaiton is at bottom of the text and conclusion paragraph after abstarct information. What is the best way to identify abstract here?. Can i continue as a NER task?

Hi! I think highlighting very long spans by hand is definitely inefficient and unnecessarily complicated. And extracting those long spans is also not something you can solve as an NER task.

Maybe you could try framing this as a text classification task and annotate at the sentence or paragraph level? This lets you click through each section and all you have to do is hit accept or reject, depending on whether the text you see is an abstract.

Thanks i will try that way.

I'd say that I've had previous success framing this as a text classification task, so can only further recommend Ines' advice