NER and POS Tagging Annotation using One Prodigy User Interface

sigitpurnomo · January 30, 2025, 6:31am

Hi all,
I am trying to annotate a corpus for the NER and POS tagging using Prodigy with a custom recipe to make the process be done in one Prodigy user interface. When I ran the Prodigy, the Prodigy UI was displayed correctly. However, the problem was that the annotation results affected each other when I tried to annotate the text, as shown in the attached image. Can you help ensure Prodigy treats them as different tasks? Here is my custom recipe script.

import spacy
import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string

@prodigy.recipe(
    "pos_ner_annotation",
    dataset=("Dataset name", "positional", None, str),
    source=("Input data (JSONL)", "positional", None, str),
    spacy_model=("spaCy model", "option", "m", str, "en_core_web_sm"),
    pos_labels=("Comma-separated POS labels", "option", "p", str, ""),
    ner_labels=("Comma-separated NER labels", "option", "n", str, "")
)
def pos_ner_annotation(dataset, source, spacy_model="blank:id", pos_labels="", ner_labels=""):
    """
    A Prodigy recipe that allows simultaneous POS tagging and NER annotation in the same UI,
    ensuring that both are completely independent.
    """
    nlp = spacy.load(spacy_model)

    # Load the dataset
    stream = JSONL(source)

    # Process label inputs
    pos_labels = split_string(pos_labels) if pos_labels else list(nlp.get_pipe("tagger").labels)
    ner_labels = split_string(ner_labels) if ner_labels else list(nlp.get_pipe("ner").labels)

    def get_annotations():
        for eg in stream:
            doc = nlp(eg["text"])

            # Token-level POS annotation (must be a list)
            pos_tokens = list([
                {
                    "text": tok.text,
                    "id": i,
                    "start": tok.idx,
                    "end": tok.idx + len(tok.text),
                    "tag": tok.pos_
                }
                for i, tok in enumerate(doc)
            ])

            # Span-level NER annotation (must be a list)
            ner_spans = list([
                {"start": ent.start_char, "end": ent.end_char, "label": ent.label_}
                for ent in doc.ents
                if ent.label_ in ner_labels
            ])

            yield {
                "text": eg["text"],
                "tokens": pos_tokens,  # Ensures POS uses a list
                "spans": ner_spans,  # Ensures NER uses a list
                "meta": {"task": "POS + NER", "model": spacy_model},
                "_input_hash": hash(eg["text"] + "POS_NER"),
                "_task_hash": hash(eg["text"] + "POS_NER")
            }

    return {
        "dataset": dataset,
        "view_id": "blocks",
        "stream": get_annotations(),
        "config": {
            "blocks": [
                {"view_id": "pos_manual", "labels": pos_labels},  # POS tagging
                {"view_id": "ner_manual", "labels": ner_labels}  # NER annotation
            ],
            "exclude_by": "task",
            "show_tok_ranks": True,
            "tokens": True,
        },
    }

Thank you.

magdaaniol · January 30, 2025, 10:27am

Hi @sigitpurnomo,

Thanks for detailed descriptions!

The problem here is that pos_manual UI uses ner_manual UI under the hood and the they both refer to the same text object.

The best way to combine ner_manual and pos_manual UIs and keep them independent would be via pages UI.
I've written a very similar recipe in this post. Could you check if the solution there could work for you?
Also, let me know if you need help adapting it. I suppose you would just need to add a function for creating the NER page instead of the choice page:

def create_ner_page(text: str, tokens: List[Dict],labels: List[str]) -> Dict[str, Any]:
    """Create a single page configuration with the given parameters."""
    page = {
        "text": text,
        "tokens": tokens,
        "view_id": "ner_manual",
        "config": {
            "labels": labels
        }
    }
    return set_hashes(page)

sigitpurnomo · January 31, 2025, 12:09am

Hi @magdaaniol,

Thank you for your suggestion. It works. Appreciated it.

Topic		Replies	Views
Is it possible to do NER and Textcat Annotation together? ner , textcat	4	38	October 28, 2024
How to join Recipe ner.manual and choice enhancement , usage , custom , front-end	4	910	July 2, 2019
Combine NER and doc classification in annotation process usage , ner , textcat , solved	5	754	July 20, 2021
Working with Annotating teams in prodigy	3	54	February 7, 2025
Multi-labels not working usage , ner , solved	6	1016	August 23, 2019

NER and POS Tagging Annotation using One Prodigy User Interface

Related topics