Labelling dataset for extractive text summarization

salman1993 · October 8, 2020, 3:07pm

I was actually able to figure out Q1, but still unsure about Q2 and Q3.

Here is the recipe.py file:

import prodigy
from prodigy.components.loaders import JSONL

@prodigy.recipe(
    "extsumm",
    dataset=("The dataset to save to", "positional", None, str),
    file_path=("Path to texts", "positional", None, str),
)
def extsumm(dataset, file_path):
    """Annotate sentences of a document to be included in extractive summary or not."""
    stream = JSONL(file_path)     # load in the JSONL file

    return {
        "dataset": dataset,   # save annotations in this dataset
        "view_id": "choice",  # use the choice interface
        "stream": stream,
        'config': {'choice_style': 'multiple'},
    }

The annotation file - ext_summ_annotation.jsonl - looks like this:

{"text": "Tick imp sentences", "options": [{"id": 0, "text": "line 1"}, {"id": 1, "text": "summary line 2"}, {"id": 2, "text": "line 3"}]}
{"text": "Tick imp sentences", "options": [{"id": 0, "text": "summary x 1"}, {"id": 1, "text": "x 2"}, {"id": 2, "text": "summary x 3"}, {"id": 3, "text": "x 4"}]}

You can run it using: prodigy extsumm extsumm_dataset ext_summ_annotation.jsonl -F recipe.py

Topic		Replies	Views
Extractive summarization with labels	5	497	June 20, 2022
text classification usage , textcat	7	1126	October 7, 2019
Datasets and using pre-annotated data Getting Started usage , solved	23	5514	November 15, 2020
.jsonl-formatted file, mark as either category a, b, or c (mutually exclusive) and save to database- how? usage , textcat , solved	2	477	August 27, 2019
Annotate multiple JSONL into multiple Datasets usage , database , solved , streams	2	549	October 7, 2021

Labelling dataset for extractive text summarization

Related topics