We are looking to use prodigy for doing extractive summarisation of long documents where each sentence has a label. We want to interactively show the user summary for each label.
As the user checks the label, the sentence is put the left pane under the section which user can edit.
Q1) How to interactively show the selected sentences under correct sections?
Q2) Is it possible to show document with its formatting intact and ask user to select important sentences in summary by double clicking on any word in that sentence and add a correct label?
I'm unaware of a Prodigy interface that offers this functionality out of the box. So it sounds like you might be interested in designing a custom interface for your specific task. Given the high level of interaction you require, especially with the editable summary, this could be a lot of work.
So while the custom interface could be a valid option, I wonder if it's possible to simplify your interface instead. It sounds like one part of the problem is selecting sentences, which is something that a choice interface can do. Can you share anything about the final goal of the dataset? Is there a reason why the labels in the rightmost column are required?
Thanks for your inputs.
The goal of dataset is to create extractive summaries of court judgments. We have developed ML model which predicts the section for each sentence (sequential sentence classification). We import pre labelled data into prodigy where for each sentence there is ML generated label which user can correct. The section (which is the rightmost label) gives structure to the judgment. So summary created in structured way.
For more explanation on Q2. I am also wondering if I redesign the task as span marking instead of choice. This is because the formatting in sentences (e.g. tabs before start of sentence) have meaning which help users to make better decisions.
I could be wrong, but aren't the "selecting sentences" and the "assigning a label to each sentence" two separate problems that each deserve their own model? Given that they deserve their own model, they may also have their own annotation task. I could be wrong though since I don't know the details of the tasks.
There are two seperate models. We have done the annotations for the sequential sentence classification seperately and built the model (BUILD). There is a seperate model for summarizer and both models work in tandem. The goal is to create ground truth for summary task combined where user has ability to correct both the section and sentence.
I think the best final advice that I have here is to repeat what's said in the callout on our documentation on custom interfaces.
It’s recommended to only use the blocks interface for annotation tasks that absolutely require the information to be collected at the same time – for instance, comments or answers about the current annotation decision. While it may be tempting to create one big interface that covers all of your labelling needs like text classification and NER at the same time, this can often lead to worse results and data quality, since it makes it harder for annotators to focus. It also makes it more difficult to iterate and make changes to the label scheme of one of the components. You can always merge annotations of different types and create a single corpus later on, for instance using the data-to-spacy recipe.