Assessing sentiments to sentences on a Scale

Hi

I have a dataset (in excel so far) with feedback from students. My goal is to split them up in sentences and then classify each sentence on a scale from 1-7 regarding 12 different emotions. So for every sentence, I would want to assess on a scale from 1-7 if (resp. how strong) I feel e. g. sympathy. And then repeat this for the other emotions and sentences.
Would something like that be possible with prodigy and if so, how?
Thanks a lot for the help!
Cheers

Hi! Yes, this should definitely be possible with a custom recipe and the choice interface โ€“ I think it just comes down to how you want to break it up and how you want to present the question. Here's a simple example of a sentiment labelling recipe with multiple options: https://prodi.gy/docs/custom-recipes#example-choice

While you could, in theory, do all emotions and all scales in one task with multiple blocks, I do think it makes sense to only focus on one question at a time and queue up the same text multiple times with different objectives. So you could either do each example multiple times in a row, once for each emotion, or grouped by emotion. And then each example gets "options" for the scale from 1-7.

So your streams could look like this:

# Grouped by text
for eg in example:
    for emotion in EMOTIONS:
        eg["label"] = emotion
        eg["options"] = options
        yield eg
# Grouped by emotion
for emotion in EMOTIONS:
    for eg in examples:
        eg["label"] = emotion
        eg["options"] = options
        yield eg

Intuitively, grouping by emotion somehow feels better to me? This would let the annotator get into the mindset of one emotion at a time and it could potentially make the annotators more consistent? (Which is often a problem that occurs if you're asking people to label on a scale, as the distinctions can be quite subtle and even the same annotator may vary with their decisions over time.) But maybe the other way of doing it makes more sense in your case, it really depends.

Btw, the "options" added to the examples allow setting a "text" and "id" value. So you can use numbers as the IDs and work with those internally, but also choose a more human-friendly description (or even emoji! :smiley:) to make it easier to distinguish the different options.

Hi Ines

Many thanks for your fast reply and help. I tried to code my own recipe as you suggested based on the choice interface. Since i haven't cleaned and prepared my data yet, I am trying my code with your example data set (news_headlines).
This is the code for the recipe:

import prodigy

@prodigy.recipe(
    "empathy_classification",
    dataset=("news_headlines", "positional", None, str),
    view_id=("choice", "option", "v", str)
)
def empathy_classification(dataset, view_id="text"):
    stream = JSONL (news_headlines)
    stream = emotions(stream)

    def emotions(stream):
        for emotion in EMOTIONS:
            for eg in examples:
                eg["sympathetic", "warm"] = emotion
                eg["1 = not at all", "7 = extremely"] = options
                yield eg

    def update(examples):
        # This function is triggered when Prodigy receives annotations
        print(f"Received {len(examples)} annotations!")

    return {
        "dataset": news_headlines,
        "view_id": "choice",
        "stream": stream,
        "update": update
    }

In the command line, I would put: python -m prodigy empathy_classification loom_feedbacks news_headlines.jsonl -F recipe.py
It gives me an error tough, saying that news_headlines.jsonl is an unrecognized argument. tried some different variations with the syntax of the file (like ./news_headlines e. g. ) but haven't resolved the problem yet.
Do you know what the problem could be?

Thanks and kind regards

I think what's happening here is that your recipe is expexting different arguments: in your function, you have dataset and view_id (which is an option, so it'll be --view-id on the command line). But then when you call the recipe from the command line, you're passing in 2 positional arguments: loom_feedbacks news_headlines.jsonl. So it doesn't know what to do with that second positional argument. Instead, I think you want it to look something like this:

@prodigy.recipe(
    "empathy_classification",
    dataset=("Name of dataset to add annotations to", "positional", None, str),
    source=("Input data file", "positional", None, str)
)
def empathy_classification(dataset, source):
    stream = JSONL(source)

Also note that the @prodigy.recipe decorator describes the recipe arguments and doesn't set their values. You can read more about this here.

There are a few more problems here that mean that your recipe won't work โ€“ for instance, you still need to define the EMOTIONS and the options that you want to assign, and you also want to loop over for eg in stream. And the string in eg[...] is the name of the key that you're assigning to, so that needs to be eg["options"] and eg["label"]. So you're setting the value of {"options": ...} to a dict of options, and the value of "label" to the emotion.

Check out the script here for a working example. It shows how the options are defined and how they're added to the stream. You should be able to test that out-of-the-box with any input file: Custom Recipes ยท Prodigy ยท An annotation tool for AI, Machine Learning & NLP

Maybe you also find my video on custom recipes helpful:

Thanks Ines!
I think I figured out the way of working with the recipe and the command line. However, I am still unsure about how to include the different emotions to the code. Would my code work if I change "emotion" in my foor loop to "label" as I have defined label to the different emotions? Or how would you suggest do set the label and where would I include the different emotions then? Also, I need to define EMOTIONS at one point, I don't know how to define it tough.

So far I got this:

import prodigy
from prodigy.components.loaders import JSONL


@prodigy.recipe(
    "empathy_classification",
    dataset=("Name of dataset to annotate to", "positional", None, str),
    source=("Input data file", "positional", None, str)
)
def empathy_classification(dataset, source):
    stream = JSONL (source)
    stream = add_options (stream)
    stream = add_label(stream)


    return {
        "dataset": dataset,
        "view_id": "choice",
        "stream": stream,
    }

def add_options(stream):
    options = [
        {"id": "1", "text": "Gar nicht"},
        {"id": "2", "text": "Kaum"},
        {"id": "3", "text": "Meist nicht"},
        {"id": "4", "text": "Neutral"},
    ]

def add_label(stream):
    label = [
        {"id": "sympathetic", "text": "sympathetic"},
        {"id": "moved", "text": "moved"},
        {"id": "compassionate", "text": "compassionate"},
        {"id": "tender", "text": "tender"},
    ]

    for emotion in EMOTIONS:
        for eg in stream:
            eg["label"] = emotion
            eg["options"] = options
            yield eg

I am still rather new to programming i'm really sorry to bother you :see_no_evil: but would really like to be able to annotate my dataset according to the different emotions and the scale.

Thanks again and cheers

The "label" in this case would be a string โ€“ for example, "sympathetic".

Here's a simple standalone example you can run in your terminal to see the idea in action:

emotions = ["sympathetic", "moved", "compassionate", "tender"]
options = [
    {"id": "1", "text": "Gar nicht"},
    {"id": "2", "text": "Kaum"},
    {"id": "3", "text": "Meist nicht"},
    {"id": "4", "text": "Neutral"},
]

# Those would later be loaded from your file
stream = [{"text": "hello world"}, {"this is a text"}]

for emotion in emotions:
    for eg in stream:
        eg["label"] = emotion
        eg["options"] = options
        print(eg)

The result should look something like this:

{'text': 'hello world', 'label': 'sympathetic', 'options': [{'id': '1', 'text': 'Gar nicht'}, {'id': '2', 'text': 'Kaum'}, {'id': '3', 'text': 'Meist nicht'}, {'id': '4', 'text': 'Neutral'}]}
{'text': 'this is a text', 'label': 'sympathetic', 'options': [{'id': '1', 'text': 'Gar nicht'}, {'id': '2', 'text': 'Kaum'}, {'id': '3', 'text': 'Meist nicht'}, {'id': '4', 'text': 'Neutral'}]}
{'text': 'hello world', 'label': 'moved', 'options': [{'id': '1', 'text': 'Gar nicht'}, {'id': '2', 'text': 'Kaum'}, {'id': '3', 'text': 'Meist nicht'}, {'id': '4', 'text': 'Neutral'}]}
...

That's the format you want, because it allows Prodigy to render the text with a label on top (the emotion) and a number of multiple-choice options. So your recipe could look something like this:

import prodigy
from prodigy.components.loaders import JSONL


@prodigy.recipe(
    "empathy_classification",
    dataset=("Name of dataset to annotate to", "positional", None, str),
    source=("Input data file", "positional", None, str),
)
def empathy_classification(dataset, source):
    stream = JSONL(source)
    stream = add_options(stream)

    return {
        "dataset": dataset,
        "view_id": "choice",
        "stream": stream,
    }


def add_options(stream):
    emotions = ["sympathetic", "moved", "compassionate", "tender"]
    options = [
        {"id": "1", "text": "Gar nicht"},
        {"id": "2", "text": "Kaum"},
        {"id": "3", "text": "Meist nicht"},
        {"id": "4", "text": "Neutral"},
    ]
    for emotion in emotions:
        for eg in stream:
            eg["label"] = emotion
            eg["options"] = options
            yield eg

And the result in your browser would then look like this:

Thanks! This works perfectly!
Happy weekend

1 Like

Hi @ines

I started annotating my data but somehow the loop isn't working. as soon as i annotated all the sentences to the first sentiment, it wouldn't change to the second one, but showing me "no tasks available" instead.
Do you know why prodigy isn't changing to the second emotion?

thanks