Is it possible to do NER and Textcat Annotation together?

sigitpurnomo · October 22, 2024, 3:15pm

Dear Prodigy Support,

Is it possible to do both NER and text classification annotation? Or I should annotated it twice, first using ner.manual and then using textcar.manual? I have text dataset that will be used for both of NER and Text classification model development.

Thank you

magdaaniol · October 23, 2024, 3:11pm

Hi @sigitpurnomo ,

To make the annotation task as efficient as possible we recommend doing your annotation in two passes: NER and textcat separately.
Prodigy train and data-to-spacy commands will take care of merging the annotations for the purpose of training a spaCy pipeline with these two components.
Annotation is easier and less error prone if annotators can focus on task at a time.

That said it is definitely possible to setup Prodigy to collect NER and textcat annotations at the same time via custom recipe. You can combine different UIs using blocks- as long as your input file contains all the necessary info to fill in ner_manual and classification view_ids, Prodigy will be able to render such block.
See here for more info and examples.

sigitpurnomo · October 24, 2024, 5:11am

Thank you, @magdaaniol, for your suggestion.

I have create a custom recipe like this:

import prodigy
from prodigy.components.preprocess import add_tokens
from prodigy.components.stream import get_stream
import spacy

@prodigy.recipe("peer_review")
def peer_review_ner_cat(dataset, lang, file_path):
    # We can use the blocks to override certain config and content, and set
    # "text": None for the choice interface so it doesn't also render the text
    blocks = [
        {"view_id": "ner_manual"},
        {"view_id": "choice", "text": None}
    ]
    options = [
        {"id": 3, "text": "General Comments"},
        {"id": 2, "text": "Detect Localization"},
        {"id": 1, "text": "Detect Suggestion"},
        {"id": 0, "text": "Detect Problem"}
    ]

    nlp = spacy.blank(lang)                           # blank spaCy pipeline for tokenization
    stream = get_stream(file_path, loader="jsonl")       # set up the stream
    stream.apply(add_tokens, nlp=nlp, stream=stream)  # tokenize the stream for ner_manual

    return {
        "dataset": dataset,          # the dataset to save annotations to
        "view_id": "blocks",         # set the view_id to "blocks"
        "stream": stream,            # the stream of incoming examples
        "config": {
            "choice_style": "multiple",
            "labels": ["Problem", "Suggestion", "Localization","General"],  # the labels for the manual NER interface
            "blocks": blocks         # add the blocks to the config
        }
    }

But when I run the prodigy with this recipe using this command:
prodigy peer_review ner_cat_peer_review id dataset/peer-review-masdig.jsonl -F prodigy/recipe.py
there is an error in the web interface like shown in this image:

Can you help me to find what caused the error?

Thank you

sigitpurnomo · October 28, 2024, 5:46am

I have solved this problem based on this information https://support.prodi.gy/t/combine-ner-and-doc-classification-in-annotation-process/3314/3

Thank you

magdaaniol · October 28, 2024, 4:49pm

Great! Thanks for sharing the post with the solution!

Topic		Replies	Views
Combine NER and doc classification in annotation process usage , ner , textcat , solved	5	754	July 20, 2021
first annotation - can I switch mid-way from ner.manual to textcat? usage , ner , textcat	4	517	July 13, 2021
Combining NER with text classification usage , ner , textcat	10	6902	March 20, 2024
textcat.manual? usage , ner , textcat , solved	4	1604	March 29, 2019
Annotation interface to do both SpanCat and NER ner , spancat	2	564	August 31, 2022

Is it possible to do NER and Textcat Annotation together?

Related topics