Edit Saved NER Manual Annotations

pvcastro · September 12, 2018, 9:03pm

Hi there!

I found this post here, which would be similar to what I'm facing, but my situation is not the same:

I was using ner.manual annotating from a stream from stdin, but I can't seem to understand how it would be possible for me to edit saved annotations instead of re-annotating them.

If I dump the annotations to a jsonl file using db-out, to try to edit them, prodigy ends up creating a new record in the dataset with the updated annotations instead of updating the existing ones. Why is that?

And is it possible for me to use my existing stream and still edit the existing annotations?

Thanks!

ines · September 13, 2018, 10:13am

Ideally, you should save the edited annotations to a new dataset, to make sure you're not destroying existing records. You can always delete the old dataset afterwards, but by default, Prodigy is designed to always keep a record of each individual annotation decision – that's also why it doesn't just silently overwrite existing records in your dataset.

pvcastro · September 13, 2018, 12:20pm

OK!

Is there some way to load annotated records to be visualized in prodigy? In order to ocasionally review annotators’ progress and conformity to the annotations specifications.

Thanks!

ines · September 13, 2018, 12:39pm

You could export a dataset as a JSONL file and then load that into ner.manual – the recipe will respect pre-set annotations, so you can review them and/or correct them if necessary. Alternatively, you could also write a script that uses db.get_examples to load the examples from a dataset, and outputs the dumped JSON so you can pipe it forward.

Depending on how sophisticated you want to process to be, you could even write a recipe that loads in a dataset, streams in the whole set (or a random selection of X examples), asks you to say yes/no to the annotations (either with the option to correct, or just a binary “is the whole thing correct?”) and then outputs a score as you exit.

I haven’t tested this yet, so consider this to be semi-pseudocode – but something like this could be cool:

import prodigy
from prodigy.components.db import connect
import random

@prodigy.recipe('review-annotations',
    dataset=("Dataset name"),
    n_examples=("Number of examples to randomly review, -1 for all", "option", "n", int),
    manual=("Allow manual corrections", "option", "m", bool)
)
def review_annotations(dataset, n_examples=-1, manual=False):
    db = connect()
    examples = db.get_examples(dataset)
    if n_examples > 0:
        # get a random selection of examples
        random.shuffle(examples)
        examples = examples[:n_examples]

    # collect scores here
    scores = {'right': 0, 'wrong': 0}

    def update(examples):
        # get all accepted / rejected examples and update scores
        rejected = [eg for eg in examples if eg['answer'] == 'reject']
        accepted = [eg for eg in examples if eg['answer'] == 'accept']
        scores['wrong'] = scores['wrong'] + len(rejected)
        scores['right'] = scores['right'] + len(accepted)

    def on_exit(ctrl):
        # called when you exit the server, compile results
        total_right = scores['right']
        total_wrong = scores['wrong']
        total = len(examples)
        print('Reviewed dataset', dataset)
        print('Correct:', total_right, '%.2f' % total_right / total )
        print('Wrong:', total_wrong, '%.2f' % total_wrong / total )

    return {
        'dataset': False,  # don't save review annotations
        'stream': examples,
        'view_id': 'ner_manual' if manual else 'ner',
        'update': update,
        'on_exit': on_exit
    }

Usage could then look something like:

prodigy review-annotations some_dataset -n 50 -F recipe.py

This would show you 50 annotations from the dataset to review, and will let you click through them to accept / reject whether the annotation is correct. When you exit the server, it will print the results, plus the percentage of correct and incorrect answers, respectively. So you can, for instance, immediately see that the annotator got 99% correct – or that they got 20% wrong, according to your review.

pvcastro · September 13, 2018, 12:53pm

Great, thanks for the tip!

I’ll give it a try!

Topic		Replies	Views
Annotated jsonl as source usage , ner , solved	2	922	September 17, 2018
Edit saved annotations ner , solved	4	1372	March 2, 2018
Reviewing/Editing annotated data usage , review , streams	1	948	June 23, 2020
Editing approved NER dataset usage , ner , solved	1	421	April 30, 2020
Re-annotating records usage , database , streams	4	567	May 5, 2020

Edit Saved NER Manual Annotations

Related topics