How to build ABSA (Aspect-Based Sentiment Analysis) annotation recipe by prodigy?

Hi,
thanks for prodigy’s team, it is a great tool.
I am trying to build a ABSA annotation recipe, I have one line text just like this:

“I took the Lipitor and it improved my situation after a week.”

and I want to add the annotation: DRUG{effect: positive} on the words “Lipitor”, that is to say, when I tagged the “Lipitor” as “DRUG”,I can choose the “positive” or “negative” as the value of entity’s attribute, how can I build this recipe, is there any suggestions?

thanks,
majun

Hi! You probably want to be doing this in separate steps: first, make sure you get the entities / top-level spans annotated correctly and adjust your label scheme if necessary (e.g. if you realise that some things are ambiguous).

Next, you can export the created annotations and maybe use the choice interface to annotate the sentiment – see here for an example. In your recipe, you could do something like this to create an example with different options for each span in your pre-annotated data:

def get_stream():
    for eg in stream:  # your pre-annotated examples
        spans = eg.get("spans", [])  # the entity spans
        for span in spans:
            new_eg = copy.deepcopy(eg)
            new_eg["spans"] = [span]  # one span per example
            new_eg["options"] = options  # add choice options
            yield new_eg

Edit: I had a typo in there – it said yield eg instead of yield new_eg.

This will give you one annotation example per entity you’ve highlighted, and the option to select one (or more, depending on your configuration) sentiment values. The advantage of doing it in two steps is also that it makes it easier to focus – first, all you need to think about is whether something is a drug. In the second step, all you need to think about is the sentiment. It also gives you a chance to start over and revise the label scheme if it turns out you need more fine-grained attributes.

thanks a lot ! I will try this pipeline.

I have some entity words (about 20 thousands) , and I want to annotate my text with these entities , the annotated data will be used in next step for sentiment annotation, so there is no need to annotate data by human with prodigy web application if I got all entity words in my text, but I also need annotated data format for highlighted entities in sentiment annotation step.

so how can I get the annotated data with my words list and text by running prodigy function or recipe without human annotation work? I found “custom rule-based logic” from https://prodi.gy/features/named-entity-recognition, but no more details , examples of code as much as possible please!

thanks again!

This page is mostly an overview of what’s possible for NER with Prodigy. For more details and API docs, check out the PRODIGY_README.html, which is available for download with Prodigy. This also includes the input formats and how to represent entity spans etc. as JSON.

Here’s an example of an example with a highlighted entity span:

{
    "text": "Hello Apple",
    "spans": [{ "start": 6, "end": 11, "label": "ORG"}]
}

At a minimum, you need the start and end character offset into the text, and the label. If you have that annotated already, it should hopefully be pretty straightforward to write a script that converts it to a list of dictionaries with "text" and "span".

You can then add the aspect options to the examples as described above, which will create input examples that look something like this:

{
    "text": "Hello Apple",
    "spans": [{ "start": 6, "end": 11, "label": "ORG"}],
    "options": [
        {"id": 0, "text": "Aspect 1"},
        {"id": 1, "text": "Aspect 2"}
    ]
}

In the UI, the example above will be displayed as a text with “Apple” highlighted as “ORG”, and two multiple-choice options to choose from. When you select an option, its ID will be added to the task as the "accept" key. For example:

{
    "text": "Hello Apple",
    "spans": [{ "start": 6, "end": 11, "label": "ORG"}],
    "options": [
        {"id": 0, "text": "Aspect 1"},
        {"id": 1, "text": "Aspect 2"}
    ],
    "accept": [1],
    "answer": "accept"
}

After annotation, you can then export the dataset using db-out, and for each entity highlighted in the text, you’ll have its offsets into the text, as well as the selected aspect option(s).

thanks.

if more than one people annotating data on the same task, for each item in the annotated dictionaries, how could I know who is annotated it?

I have added “?session=xxx” to url, so it is like: http://localhost:8080/?session=majun, but there is only “_session_id”:“drug-default” in the annotated dictionaries.

there are only several members in my annotator team, and I want to add something like “annotator id” for each annotated item ,so how can I achieve that just by prodigy not prodigy sacle?

Yes, this should work! Did you double-check that you actually saved annotations from the different sessions?

I got the reason , maybe I should set “auto_exclude_current”: false

thanks you !!

by the way, I have 25 distinct texts in the input json file , and I have set “batch_size”: 100 in prodigy.json, but there are only 10 annotated texts in the annotated result, anywhere else I can set it ?

How did you annotate the data? Did all 25 examples show up? And did you make sure that the annotators saved the annotations by clicking the “save” buttons after they were done? Prodigy will kep the most recent examples in the app so you can quickly undo, and then send the answers back in batches automatically once a batch is full. With a batch size of 100, that’s not going to happen before the 25 examples are annotated – so you’ll definitely have to save manually.

yeah, the annotators was under the mistaken understanding that the prodigy will save data Automatically ,it’s ok now. And I have another confusion about my recipe, my snapshot of prodigy web page as below:

I want to set choices for two drugs in the same page separately , how could I get the accept and answer for each drug?

Glad it works now! And yes, it does save it automatically as well – but only in batches. So to be safe, always hit “save” at the end :slightly_smiling_face:

When you stream in the data, you could iterate over the spans and create a new example for each span. Just like in the code snippet I posted above. I just noticed that I had a typo in there – yield eg should be yield new_eg:

def get_stream():
    for eg in stream:  # your pre-annotated examples
        spans = eg.get("spans", [])  # the entity spans
        for span in spans:
            new_eg = copy.deepcopy(eg)
            new_eg["spans"] = [span]  # one span per example
            new_eg["options"] = options  # add choice options
            yield new_eg

For each span (highlighted drug), this will create a new example with options and only a single span and then send it out for annotation.

thanks a lot, I used these code but not correctly, I have fixed error and it is ok now, just a little problem as snapshot:
p

How to get progress by this bar?

and how can I set the Task queue depth by Controller ?

thanks!!

There are two things here: The progress is updated via the server, so you first need to submit a batch of answers. If you want an accurate progress, you probably want a batch size smaller than 100.

Second, the progress isn’t always displayed by default, because streams are generators and can techically be inifnite. So the stream either needs to expose a __len__, or you need to return a custom progress function by your recipe. If you don’t have too many examples, the easiest solution would be to convert your stream to a list:

stream = get_stream()
stream = list(stream)

Sorry if this was confusing – this is a logging message displayed by the library that serves the app, and has nothing to do with Prodigy.

For all other API related questions, check out the PRODIGY_README.html.