Labelling dataset for extractive text summarization

justindujardin · October 15, 2020, 5:48pm

@salman1993 thanks for the detailed write-up, it helped me reproduce your problem!

The trouble seems to be that you're using an overlapping feed which excludes items based on your session name. Despite this flag being set to support multiple named annotators, you're opening the browser without a session. Consider the two urls:

http://localhost:8080 opens the browser using the default session that is generated with the current date+time that the server starts. When using this with named-sessions, you effectively get a new session each time you restart the server. This is why you keep seeing the same questions.
http://localhost:8080/?session=my_name opens the browser with a fixed session name, so that restarting the server doesn't cause the annotations to start from the beginning

So you can use a named session for your annotations, or you can disable them and get the behavior you want by visiting the first URL. Do that by setting feed_overlap to False in your recipe:

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import set_hashes


@prodigy.recipe(
    "extsumm",
    dataset=("The dataset to save to", "positional", None, str),
    file_path=("Path to texts", "positional", None, str),
)
def extsumm(dataset, file_path):
    """Annotate sentences of a document to be included in extractive summary
    or not."""

    def get_stream():
        stream = JSONL(file_path)  # load in the JSONL file
        for eg in stream:
            eg["text"] = "Tick messages to be included in summary"
            eg = set_hashes(eg, input_keys=("id"))  # CHANGE
            yield eg

    return {
        "dataset": dataset,  # save annotations in this dataset
        "view_id": "choice",  # use the choice interface
        "stream": get_stream(),
        "config": {
            "choice_style": "multiple",
            "feed_overlap": False,
        },
    }

Topic		Replies	Views
Extractive summarization with labels	5	497	June 20, 2022
text classification usage , textcat	7	1126	October 7, 2019
Datasets and using pre-annotated data Getting Started usage , solved	23	5513	November 15, 2020
.jsonl-formatted file, mark as either category a, b, or c (mutually exclusive) and save to database- how? usage , textcat , solved	2	477	August 27, 2019
Annotate multiple JSONL into multiple Datasets usage , database , solved , streams	2	549	October 7, 2021

Labelling dataset for extractive text summarization

Related topics