sublabel for NER

How could I annotate an Entity that has multiple attributes in prodigy - for example in a sentence - Patient did not complain of wheezing

in this case the:
DiseaseEntity=wheezing
Polarity=Negative

This is how it is represented in uima
<textsem:DiseaseAnnotation xmi:id="36944" sofa="1" begin="9747" end="9752" polarity="0" uncertainty="0" conditional="false" generic="false" historyOf="0" FocusText="wheezing" Scope="[PREN]"/>

Hi! The NER interface is really optimised for actual named entity recognition tasks where the goal is to predict single token-based tags. One option would be to make two passes over the data and use the first pass to focus on getting the boundaries right (e.g. DiseaseEntity) and then annotate all additional attributes in the next round (e.g. using a multiple choice interface). If you're doing two passes, you can also use the first round to make sure all entities are labelled and that the annotation scheme is implemented correctly, before going deeper.

I've also seen approaches that designed a very similar annotation task as a relation annotation task (maybe that was somewhere in the medical tag) – but I think in that case, the goal was to also capture the indicator for the polarity (i.e. "not"). If that's not so important for you, that might be overkill.

multiple choice interface make sense - but could walk me through on how to use this interface?

prodigy ner.manual ner_2 ./exported_file.jsonl --view-id choice

usage: prodigy ner.manual [-h] [-lo None] [-l None] [-pt None] [-e None] [-C] dataset spacy_model source
prodigy ner.manual: error: unrecognized arguments: --view-id

Hi! Sorry if the way I phrased this was a bit confusing – the idea I suggested was to start by labelling the entities with the top-level label (e.g. Disease) using ner.manual. In the next step, you could then load the existing dataset into a different recipe that uses the multiple choice UI and goes through every single entity and asks you to annotate additional labels.

For example, your custom recipe could look something like this. You can read more about custom recipes and how they work here: https://prodi.gy/docs/custom-recipes

import prodigy
from prodigy.components.db import connect
import copy

@prodigy.recipe("entity-sublabels")
def entity_sublabels(dataset, source_dataset):
    db = connect()
    stream = db.get_dataset(source_dataset)  # load the existing NER annotations
    # The multiple choice options to add: you can set them up however you like
    options = [
        {"id": "Polarity=Positive", "text": "Polarity: positive"},
        {"id": "Polarity=Negative", "text": "Polarity: negative"}
    ]
    
    def get_stream(stream):
        for eg in stream:
            for span in eg.get("spans", []):
                # Create a new example for each annotated span and add choice options
                task = copy.deepcopy(eg)
                task["spans"] = [span]
                task["options"] = options
                yield task

    stream = get_stream(stream)

    return {
        "dataset": dataset,   # dataset to save annotations to
        "view_id": "choice",
        "stream": stream
    }

You can run it with prodigy entity-sublabels your_final_dataset your_ner_dataset -F recipe.py, where recipe.py is the file containing the code.

There are various things you can add to make the process more efficient: for example, a good trick is to group the same entities together, so you can annotate all instances of "wheezing" in a row. Sorting by frequency is a nice trick as well, because you'll likely have an uneven distribution of entities, with some being super common and others being quite rare.

Thank you

Hi

I tried to implement this is prodigy and the choice options loaded and I was able to make selections; however, it isn't linked to the top-level label. Any idea of how it be remain linked to the top level label? Also, if there is more than one person, the script only loads one person per image, not all of them. Any guidance would be appreciated

        for eg in stream:
            for span in eg.get("spans", []):
                if span["label"] == "person":
                    task = copy.deepcopy(eg)
                    task['spans'] = [span]
                    task['options'] = options
            # eg["options"] = options
                yield task  

Here is an example of the spans for one image

"spans": [
{"label": "person", "model": "yolov3-opencv", "box": [409, 13, 360, 653], "points": [[409, 13], [769, 13], [769, 666], [409, 666]], "confidence": 0.9993869662284851, "color": "#ff00ff", "qid": "1_person", "person_filters_yolo": [0, 1]}, 
{"label": "person", "model": "yolov3-opencv", "box": [131, 147, 293, 515], "points": [[131, 147], [424, 147], [424, 662], [131, 662]], "confidence": 0.9951488375663757, "color": "#ff00ff", "qid": "2_person", "person_filters_yolo": [0, 1]}, 
{"label": "person", "probability": 0.9997418522834778, "box": [149, 153, 256, 506], "points": [[149, 153], [405, 153], [405, 659], [149, 659]], "model": "facebook/detr-resnet-50", "color": "#ffff00", "qid": "61_person", "person_filters": [61, 98], "person_number": "person_0"}, 
{"label": "person", "probability": 0.9996631145477295, "box": [422, 32, 346, 625], "points": [[422, 32], [768, 32], [768, 657], [422, 657]], "model": "facebook/detr-resnet-50", "color": "#ffff00", "qid": "98_person", "person_filters": [61, 98], "person_number": "person_1"}]

Providing another support issue that tried to address this question.