Extractive summarization with labels

prathamesh · June 14, 2022, 12:54pm

We are looking to use prodigy for doing extractive summarisation of long documents where each sentence has a label. We want to interactively show the user summary for each label.
As the user checks the label, the sentence is put the left pane under the section which user can edit.

E.g. user selects third sentences and marks the label as "FACTS" then this sentence should go to the left pane under "FACTS SUMMARY".
I referred to Labelling dataset for extractive text summarization - #8 by justindujardin , But I am unable to figure out how to add the sections.

Q1) How to interactively show the selected sentences under correct sections?

Q2) Is it possible to show document with its formatting intact and ask user to select important sentences in summary by double clicking on any word in that sentence and add a correct label?

koaning · June 14, 2022, 2:45pm

I'm unaware of a Prodigy interface that offers this functionality out of the box. So it sounds like you might be interested in designing a custom interface for your specific task. Given the high level of interaction you require, especially with the editable summary, this could be a lot of work.

So while the custom interface could be a valid option, I wonder if it's possible to simplify your interface instead. It sounds like one part of the problem is selecting sentences, which is something that a choice interface can do. Can you share anything about the final goal of the dataset? Is there a reason why the labels in the rightmost column are required?

prathamesh · June 14, 2022, 3:12pm

Thanks for your inputs.
The goal of dataset is to create extractive summaries of court judgments. We have developed ML model which predicts the section for each sentence (sequential sentence classification). We import pre labelled data into prodigy where for each sentence there is ML generated label which user can correct. The section (which is the rightmost label) gives structure to the judgment. So summary created in structured way.

For more explanation on Q2. I am also wondering if I redesign the task as span marking instead of choice. This is because the formatting in sentences (e.g. tabs before start of sentence) have meaning which help users to make better decisions.

koaning · June 14, 2022, 3:19pm

I could be wrong, but aren't the "selecting sentences" and the "assigning a label to each sentence" two separate problems that each deserve their own model? Given that they deserve their own model, they may also have their own annotation task. I could be wrong though since I don't know the details of the tasks.

prathamesh · June 14, 2022, 3:27pm

There are two seperate models. We have done the annotations for the sequential sentence classification seperately and built the model (BUILD). There is a seperate model for summarizer and both models work in tandem. The goal is to create ground truth for summary task combined where user has ability to correct both the section and sentence.

koaning · June 20, 2022, 2:29pm

I think the best final advice that I have here is to repeat what's said in the callout on our documentation on custom interfaces.

It’s recommended to only use the blocks interface for annotation tasks that absolutely require the information to be collected at the same time – for instance, comments or answers about the current annotation decision. While it may be tempting to create one big interface that covers all of your labelling needs like text classification and NER at the same time, this can often lead to worse results and data quality, since it makes it harder for annotators to focus. It also makes it more difficult to iterate and make changes to the label scheme of one of the components. You can always merge annotations of different types and create a single corpus later on, for instance using the data-to-spacy recipe.

Topic		Replies	Views
Labelling dataset for extractive text summarization usage , custom	7	1705	October 15, 2020
Classify sentences with paragraph visible usage , front-end , solved	3	492	January 30, 2023
Marking sentences for classification usage , textcat , custom	3	1089	April 28, 2020
Sentence fragments in context for classification labeling task. ner , textcat , front-end	1	436	September 8, 2020
Best Practices for Segmenting Text into Passages and Applying Multi-label Classification	1	797	September 13, 2023

Extractive summarization with labels

Related topics