How to build ABSA (Aspect-Based Sentiment Analysis) annotation recipe by prodigy?

v-juma1 · May 24, 2019, 12:05am

Hi,
thanks for prodigy’s team, it is a great tool.
I am trying to build a ABSA annotation recipe, I have one line text just like this:

“I took the Lipitor and it improved my situation after a week.”

and I want to add the annotation: DRUG{effect: positive} on the words “Lipitor”, that is to say, when I tagged the “Lipitor” as “DRUG”，I can choose the “positive” or “negative” as the value of entity’s attribute, how can I build this recipe, is there any suggestions?

thanks,
majun

ines · May 24, 2019, 9:46am

Hi! You probably want to be doing this in separate steps: first, make sure you get the entities / top-level spans annotated correctly and adjust your label scheme if necessary (e.g. if you realise that some things are ambiguous).

Next, you can export the created annotations and maybe use the choice interface to annotate the sentiment – see here for an example. In your recipe, you could do something like this to create an example with different options for each span in your pre-annotated data:

def get_stream():
    for eg in stream:  # your pre-annotated examples
        spans = eg.get("spans", [])  # the entity spans
        for span in spans:
            new_eg = copy.deepcopy(eg)
            new_eg["spans"] = [span]  # one span per example
            new_eg["options"] = options  # add choice options
            yield new_eg

Edit: I had a typo in there – it said yield eg instead of yield new_eg.

This will give you one annotation example per entity you’ve highlighted, and the option to select one (or more, depending on your configuration) sentiment values. The advantage of doing it in two steps is also that it makes it easier to focus – first, all you need to think about is whether something is a drug. In the second step, all you need to think about is the sentiment. It also gives you a chance to start over and revise the label scheme if it turns out you need more fine-grained attributes.

v-juma1 · May 29, 2019, 9:54am

thanks a lot ! I will try this pipeline.

I have some entity words (about 20 thousands) , and I want to annotate my text with these entities , the annotated data will be used in next step for sentiment annotation, so there is no need to annotate data by human with prodigy web application if I got all entity words in my text, but I also need annotated data format for highlighted entities in sentiment annotation step.

so how can I get the annotated data with my words list and text by running prodigy function or recipe without human annotation work? I found “custom rule-based logic” from https://prodi.gy/features/named-entity-recognition, but no more details , examples of code as much as possible please!

thanks again!

ines · May 29, 2019, 11:44am

This page is mostly an overview of what's possible for NER with Prodigy. For more details and API docs, check out the PRODIGY_README.html, which is available for download with Prodigy. This also includes the input formats and how to represent entity spans etc. as JSON.

Here's an example of an example with a highlighted entity span:

{
    "text": "Hello Apple",
    "spans": [{ "start": 6, "end": 11, "label": "ORG"}]
}

At a minimum, you need the start and end character offset into the text, and the label. If you have that annotated already, it should hopefully be pretty straightforward to write a script that converts it to a list of dictionaries with "text" and "span".

You can then add the aspect options to the examples as described above, which will create input examples that look something like this:

{
    "text": "Hello Apple",
    "spans": [{ "start": 6, "end": 11, "label": "ORG"}],
    "options": [
        {"id": 0, "text": "Aspect 1"},
        {"id": 1, "text": "Aspect 2"}
    ]
}

In the UI, the example above will be displayed as a text with "Apple" highlighted as "ORG", and two multiple-choice options to choose from. When you select an option, its ID will be added to the task as the "accept" key. For example:

{
    "text": "Hello Apple",
    "spans": [{ "start": 6, "end": 11, "label": "ORG"}],
    "options": [
        {"id": 0, "text": "Aspect 1"},
        {"id": 1, "text": "Aspect 2"}
    ],
    "accept": [1],
    "answer": "accept"
}

After annotation, you can then export the dataset using db-out, and for each entity highlighted in the text, you'll have its offsets into the text, as well as the selected aspect option(s).

v-juma1 · June 3, 2019, 2:50pm

thanks.

if more than one people annotating data on the same task, for each item in the annotated dictionaries, how could I know who is annotated it?

I have added “?session=xxx” to url, so it is like: http://localhost:8080/?session=majun, but there is only “_session_id”:“drug-default” in the annotated dictionaries.

there are only several members in my annotator team, and I want to add something like “annotator id” for each annotated item ,so how can I achieve that just by prodigy not prodigy sacle?

ines · June 3, 2019, 4:42pm

Yes, this should work! Did you double-check that you actually saved annotations from the different sessions?

v-juma1 · June 3, 2019, 5:33pm

I got the reason , maybe I should set “auto_exclude_current”: false

thanks you !!

v-juma1 · June 3, 2019, 6:18pm

by the way, I have 25 distinct texts in the input json file , and I have set “batch_size”: 100 in prodigy.json, but there are only 10 annotated texts in the annotated result, anywhere else I can set it ?

ines · June 4, 2019, 9:07am

How did you annotate the data? Did all 25 examples show up? And did you make sure that the annotators saved the annotations by clicking the “save” buttons after they were done? Prodigy will kep the most recent examples in the app so you can quickly undo, and then send the answers back in batches automatically once a batch is full. With a batch size of 100, that’s not going to happen before the 25 examples are annotated – so you’ll definitely have to save manually.

v-juma1 · June 4, 2019, 11:54am

yeah, the annotators was under the mistaken understanding that the prodigy will save data Automatically ,it’s ok now. And I have another confusion about my recipe, my snapshot of prodigy web page as below:

I want to set choices for two drugs in the same page separately , how could I get the accept and answer for each drug?

ines · June 4, 2019, 11:57am

Glad it works now! And yes, it does save it automatically as well – but only in batches. So to be safe, always hit "save" at the end

When you stream in the data, you could iterate over the spans and create a new example for each span. Just like in the code snippet I posted above. I just noticed that I had a typo in there – yield eg should be yield new_eg:

def get_stream():
    for eg in stream:  # your pre-annotated examples
        spans = eg.get("spans", [])  # the entity spans
        for span in spans:
            new_eg = copy.deepcopy(eg)
            new_eg["spans"] = [span]  # one span per example
            new_eg["options"] = options  # add choice options
            yield new_eg

For each span (highlighted drug), this will create a new example with options and only a single span and then send it out for annotation.

v-juma1 · June 4, 2019, 12:35pm

thanks a lot, I used these code but not correctly, I have fixed error and it is ok now, just a little problem as snapshot:

How to get progress by this bar?

v-juma1 · June 5, 2019, 4:13am

and how can I set the Task queue depth by Controller ?

thanks!!

ines · June 5, 2019, 8:46am

There are two things here: The progress is updated via the server, so you first need to submit a batch of answers. If you want an accurate progress, you probably want a batch size smaller than 100.

Second, the progress isn't always displayed by default, because streams are generators and can techically be inifnite. So the stream either needs to expose a __len__, or you need to return a custom progress function by your recipe. If you don't have too many examples, the easiest solution would be to convert your stream to a list:

stream = get_stream()
stream = list(stream)

Sorry if this was confusing – this is a logging message displayed by the library that serves the app, and has nothing to do with Prodigy.

For all other API related questions, check out the PRODIGY_README.html.

Topic		Replies	Views
annotating entities in text documents usage , ner , solved	15	9932	November 28, 2017
prodigy use case for annotation having pre-annotated text usage , solved	8	1264	March 11, 2019
Annotating custom entities in job descriptions usage , custom , hr	9	1160	June 2, 2019
Annotation for Argument Mining usage , custom , solved	17	2197	June 29, 2018
annotations imported via db-in not showned ner , done , front-end	2	40	August 31, 2024

How to build ABSA (Aspect-Based Sentiment Analysis) annotation recipe by prodigy?

Related topics