sublabel for NER

tiktok · May 13, 2021, 12:26am

How could I annotate an Entity that has multiple attributes in prodigy - for example in a sentence - Patient did not complain of wheezing

in this case the:
DiseaseEntity=wheezing
Polarity=Negative

This is how it is represented in uima
<textsem:DiseaseAnnotation xmi:id="36944" sofa="1" begin="9747" end="9752" polarity="0" uncertainty="0" conditional="false" generic="false" historyOf="0" FocusText="wheezing" Scope="[PREN]"/>

ines · May 14, 2021, 10:31am

Hi! The NER interface is really optimised for actual named entity recognition tasks where the goal is to predict single token-based tags. One option would be to make two passes over the data and use the first pass to focus on getting the boundaries right (e.g. DiseaseEntity) and then annotate all additional attributes in the next round (e.g. using a multiple choice interface). If you're doing two passes, you can also use the first round to make sure all entities are labelled and that the annotation scheme is implemented correctly, before going deeper.

I've also seen approaches that designed a very similar annotation task as a relation annotation task (maybe that was somewhere in the medical tag) – but I think in that case, the goal was to also capture the indicator for the polarity (i.e. "not"). If that's not so important for you, that might be overkill.

tiktok · May 14, 2021, 1:31pm

multiple choice interface make sense - but could walk me through on how to use this interface?

tiktok · May 14, 2021, 1:42pm

prodigy ner.manual ner_2 ./exported_file.jsonl --view-id choice

usage: prodigy ner.manual [-h] [-lo None] [-l None] [-pt None] [-e None] [-C] dataset spacy_model source
prodigy ner.manual: error: unrecognized arguments: --view-id

ines · May 15, 2021, 1:48am

Hi! Sorry if the way I phrased this was a bit confusing – the idea I suggested was to start by labelling the entities with the top-level label (e.g. Disease) using ner.manual. In the next step, you could then load the existing dataset into a different recipe that uses the multiple choice UI and goes through every single entity and asks you to annotate additional labels.

For example, your custom recipe could look something like this. You can read more about custom recipes and how they work here: https://prodi.gy/docs/custom-recipes

import prodigy
from prodigy.components.db import connect
import copy

@prodigy.recipe("entity-sublabels")
def entity_sublabels(dataset, source_dataset):
    db = connect()
    stream = db.get_dataset(source_dataset)  # load the existing NER annotations
    # The multiple choice options to add: you can set them up however you like
    options = [
        {"id": "Polarity=Positive", "text": "Polarity: positive"},
        {"id": "Polarity=Negative", "text": "Polarity: negative"}
    ]
    
    def get_stream(stream):
        for eg in stream:
            for span in eg.get("spans", []):
                # Create a new example for each annotated span and add choice options
                task = copy.deepcopy(eg)
                task["spans"] = [span]
                task["options"] = options
                yield task

    stream = get_stream(stream)

    return {
        "dataset": dataset,   # dataset to save annotations to
        "view_id": "choice",
        "stream": stream
    }

You can run it with prodigy entity-sublabels your_final_dataset your_ner_dataset -F recipe.py, where recipe.py is the file containing the code.

There are various things you can add to make the process more efficient: for example, a good trick is to group the same entities together, so you can annotate all instances of "wheezing" in a row. Sorting by frequency is a nice trick as well, because you'll likely have an uneven distribution of entities, with some being super common and others being quite rare.

tiktok · May 16, 2021, 5:59pm

Thank you

c00lcoder · December 29, 2022, 2:34pm

Hi

I tried to implement this is prodigy and the choice options loaded and I was able to make selections; however, it isn't linked to the top-level label. Any idea of how it be remain linked to the top level label? Also, if there is more than one person, the script only loads one person per image, not all of them. Any guidance would be appreciated

        for eg in stream:
            for span in eg.get("spans", []):
                if span["label"] == "person":
                    task = copy.deepcopy(eg)
                    task['spans'] = [span]
                    task['options'] = options
            # eg["options"] = options
                yield task

Here is an example of the spans for one image

"spans": [
{"label": "person", "model": "yolov3-opencv", "box": [409, 13, 360, 653], "points": [[409, 13], [769, 13], [769, 666], [409, 666]], "confidence": 0.9993869662284851, "color": "#ff00ff", "qid": "1_person", "person_filters_yolo": [0, 1]}, 
{"label": "person", "model": "yolov3-opencv", "box": [131, 147, 293, 515], "points": [[131, 147], [424, 147], [424, 662], [131, 662]], "confidence": 0.9951488375663757, "color": "#ff00ff", "qid": "2_person", "person_filters_yolo": [0, 1]}, 
{"label": "person", "probability": 0.9997418522834778, "box": [149, 153, 256, 506], "points": [[149, 153], [405, 153], [405, 659], [149, 659]], "model": "facebook/detr-resnet-50", "color": "#ffff00", "qid": "61_person", "person_filters": [61, 98], "person_number": "person_0"}, 
{"label": "person", "probability": 0.9996631145477295, "box": [422, 32, 346, 625], "points": [[422, 32], [768, 32], [768, 657], [422, 657]], "model": "facebook/detr-resnet-50", "color": "#ffff00", "qid": "98_person", "person_filters": [61, 98], "person_number": "person_1"}]

ryanwesslen · January 4, 2023, 4:24pm

Providing another support issue that tried to address this question.

Topic		Replies	Views
How to build ABSA (Aspect-Based Sentiment Analysis) annotation recipe by prodigy? usage , custom , solved , medical	13	2629	June 5, 2019
Annotate text with multiple entities using ner_manual usage , ner	4	877	November 26, 2018
Combine NER and doc classification in annotation process usage , ner , textcat , solved	5	754	July 20, 2021
Multi-labels not working usage , ner , solved	6	1016	August 23, 2019
Multi-label NER usage , ner	1	1633	April 25, 2021

sublabel for NER

Related topics