textcat annotation with diff highlight

tagucci · April 9, 2020, 8:21am

Hi,

I'm trying to annotate grammatical error labels with the checking difference between source sentence (error sentence) and target sentence (corrected sentence) like this.
However, commands like below caused error (unrecognized argument).

prodigy textcat.manual grammatical_err dataset.jsonl --label orthography,spelling,verb,form,tense,sva --view-id diff

The data format in dataset.jsonl is below.

{'accept': {'text': 'This is the test.'},
  'reject': {'text': 'This is a test.'}}

How can I solve to annotate text category with diff highlight?

ines · April 9, 2020, 8:42am

Hi! The --view-id is argument doesn't exist on the textcat.manual recipe – see here for the available arguments.

It sounds like you want to combine the diff interface with the choice UI for multiple choice options, right? The easiest way to do this would be to use a custom recipe with blocks. Check out the docs here for details and examples: https://prodi.gy/docs/custom-interfaces#blocks You might also find my video tutorial helpful.

Your blocks could look like this:

blocks = [{"view_id": "diff"}, {"view_id": "choice"}]

And then each record in your data could look like this:

{
    "accept": {"text": "This is the test."},
    "reject": {"text": "This is a test."},
    "options": [
        {"id": "orthography", "text": "orthography"},
        # and so on...
    ]
}

tagucci · April 10, 2020, 11:18am

@ines
Thanks for replying and detailed information!

Following your video tutorial, I wrote custom recipe recipe.py as below.

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string

@prodigy.recipe(
    "grammar-error",
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    )

def grammar_error(dataset, source):
    stream = JSONL(source)
    blocks = [{"view_id": "diff"}, {"view_id": "choice"}]
    options = [
         {"id": "orthography", "text": "orthography"},
         {"id": "spelling", "text": "spelling"},
         {"id": "verb", "text": "verb"},
         {"id": "form", "text": "form"},
         {"id": "tense", "text": "tese"},
         {"id": "sva", "text": "sva"}
    ]
    return {
        "stream": stream,
        "dataset": dataset,
        "view_id": "blocks",
        "config": {
            "labels": options,
            "blocks": blocks
        }
    }

Then, run the below command.

prodigy grammar-error gec dataset.jsonl -F ./recipe.py

However, the following message is displayed when I launched the server.

Oops, something went wrong:(
You might have come across a bug in Prodigy's web app – sorry about that. We'd love to fix this, so feel free to open an issue on the Prodigy Support Forum and include the steps that led to this message.

How do I fix this error?

ines · April 10, 2020, 11:33am

Ah, Prodigy could probably fail more gracefully here. It looks like you forgot to actually add your options to the incoming task at "options"? Instead, you're just passing them in as the labels (which are what you'd use as the top-level labels for manual NER or image annotation).

See the docs here for how the options fit in: https://prodi.gy/docs/custom-recipes#example-choice
I'm also showing this in the video starting at ~32:00: https://youtu.be/zlyq9z7hdUA?t=1920

tagucci · April 10, 2020, 1:14pm

@ines
Hmm, I fixed to forget "options" in stream but it's still something wrong.

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string

@prodigy.recipe(
    "grammar-error",
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    )
def grammar_error(dataset, source):
    stream = JSONL(source)
    blocks = [
        {"view_id": "diff"},  
        {"view_id": "choice"},
        ]
    stream = add_options(stream)
    return {
        "stream": stream,
        "dataset": dataset,
        "view_id": "blocks",
        "config": {
            "blocks": blocks,
        }
    }

def add_options(stream):
    options = [
         {"id": "orthography", "text": "orthography"},
         {"id": "spelling", "text": "spelling"},
         {"id": "verb", "text": "verb"},
         {"id": "form", "text": "form"},
         {"id": "tense", "text": "tese"},
         {"id": "sva", "text": "sva"}
    ]
    for task in stream:
        task["options"] = options
        yield task

I could launch server successfully when I changed blocks = [ {"view_id": "diff"}, {"view_id": "choice"}, ] to blocks = [ {"view_id": "diff"}] although this could show only diff.
Why do I fail when combining diff and choice?

ines · April 12, 2020, 9:47am

Your recipe looks good. I just had a look and it seems like the combination of choice and diff is the only one that's currently not possible because they both share and use the "accept" key differently. For the diff interface, it holds one version of the text, and for the choice interface, it holds the list of accepted options.

I will fix this for the next version and make the diff UI also accept a simpler format of maybe "inserted": "..." and "deleted": "..." so it doesn't clash with any other UI.

tagucci · April 13, 2020, 12:19am

@ines
Thanks! I'm looking forward to use newer prodigy:)

ines · June 17, 2020, 4:52pm

Just released Prodigy v1.10, which supports a simpler data format for the diff interface that doesn't clash with the format of other interfaces like choice. See here for details.

Topic		Replies	Views
Textcat correct recipe usage , textcat , solved	1	630	September 16, 2020
Correcting textcat.manual textcat	6	411	November 8, 2022
textcat-multilabel annotations format textcat	2	209	January 26, 2024
Converting choice annotations to textcat annotations usage , textcat , custom , solved	6	1418	September 5, 2018
Highlight list of terms in `textcat.manual` for binary annonation usage , textcat	2	411	April 21, 2022

textcat annotation with diff highlight

Related topics