textcat annotation with diff highlight

Hi,

I'm trying to annotate grammatical error labels with the checking difference between source sentence (error sentence) and target sentence (corrected sentence) like this.
However, commands like below caused error (unrecognized argument).

prodigy textcat.manual grammatical_err dataset.jsonl --label orthography,spelling,verb,form,tense,sva --view-id diff

The data format in dataset.jsonl is below.

{'accept': {'text': 'This is the test.'},
  'reject': {'text': 'This is a test.'}}

How can I solve to annotate text category with diff highlight?

Hi! The --view-id is argument doesn't exist on the textcat.manual recipe – see here for the available arguments.

It sounds like you want to combine the diff interface with the choice UI for multiple choice options, right? The easiest way to do this would be to use a custom recipe with blocks. Check out the docs here for details and examples: https://prodi.gy/docs/custom-interfaces#blocks You might also find my video tutorial helpful.

Your blocks could look like this:

blocks = [{"view_id": "diff"}, {"view_id": "choice"}]

And then each record in your data could look like this:

{
    "accept": {"text": "This is the test."},
    "reject": {"text": "This is a test."},
    "options": [
        {"id": "orthography", "text": "orthography"},
        # and so on...
    ]
}

@ines
Thanks for replying and detailed information!

Following your video tutorial, I wrote custom recipe recipe.py as below.

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string

@prodigy.recipe(
    "grammar-error",
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    )

def grammar_error(dataset, source):
    stream = JSONL(source)
    blocks = [{"view_id": "diff"}, {"view_id": "choice"}]
    options = [
         {"id": "orthography", "text": "orthography"},
         {"id": "spelling", "text": "spelling"},
         {"id": "verb", "text": "verb"},
         {"id": "form", "text": "form"},
         {"id": "tense", "text": "tese"},
         {"id": "sva", "text": "sva"}
    ]
    return {
        "stream": stream,
        "dataset": dataset,
        "view_id": "blocks",
        "config": {
            "labels": options,
            "blocks": blocks
        }
    }

Then, run the below command.

prodigy grammar-error gec dataset.jsonl -F ./recipe.py

However, the following message is displayed when I launched the server.

Oops, something went wrong:(
You might have come across a bug in Prodigy's web app – sorry about that. We'd love to fix this, so feel free to open an issue on the Prodigy Support Forum and include the steps that led to this message.

How do I fix this error?

Ah, Prodigy could probably fail more gracefully here. It looks like you forgot to actually add your options to the incoming task at "options"? Instead, you're just passing them in as the labels (which are what you'd use as the top-level labels for manual NER or image annotation).

See the docs here for how the options fit in: https://prodi.gy/docs/custom-recipes#example-choice
I'm also showing this in the video starting at ~32:00: https://youtu.be/zlyq9z7hdUA?t=1920

@ines
Hmm, I fixed to forget "options" in stream but it's still something wrong.

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string

@prodigy.recipe(
    "grammar-error",
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    )
def grammar_error(dataset, source):
    stream = JSONL(source)
    blocks = [
        {"view_id": "diff"},  
        {"view_id": "choice"},
        ]
    stream = add_options(stream)
    return {
        "stream": stream,
        "dataset": dataset,
        "view_id": "blocks",
        "config": {
            "blocks": blocks,
        }
    }

def add_options(stream):
    options = [
         {"id": "orthography", "text": "orthography"},
         {"id": "spelling", "text": "spelling"},
         {"id": "verb", "text": "verb"},
         {"id": "form", "text": "form"},
         {"id": "tense", "text": "tese"},
         {"id": "sva", "text": "sva"}
    ]
    for task in stream:
        task["options"] = options
        yield task

I could launch server successfully when I changed blocks = [ {"view_id": "diff"}, {"view_id": "choice"}, ] to blocks = [ {"view_id": "diff"}] although this could show only diff.
Why do I fail when combining diff and choice?

Your recipe looks good. I just had a look and it seems like the combination of choice and diff is the only one that's currently not possible because they both share and use the "accept" key differently. For the diff interface, it holds one version of the text, and for the choice interface, it holds the list of accepted options.

I will fix this for the next version and make the diff UI also accept a simpler format of maybe "inserted": "..." and "deleted": "..." so it doesn't clash with any other UI.

@ines
Thanks! I'm looking forward to use newer prodigy:)

Just released Prodigy v1.10, which supports a simpler data format for the diff interface that doesn't clash with the format of other interfaces like choice. See here for details.

3 Likes