Edit Saved NER Manual Annotations

Hi there!

I found this post here, which would be similar to what I’m facing, but my situation is not the same:

I was using ner.manual annotating from a stream from stdin, but I can’t seem to understand how it would be possible for me to edit saved annotations instead of re-annotating them.

If I dump the annotations to a jsonl file using db-out, to try to edit them, prodigy ends up creating a new record in the dataset with the updated annotations instead of updating the existing ones. Why is that?

And is it possible for me to use my existing stream and still edit the existing annotations?


Ideally, you should save the edited annotations to a new dataset, to make sure you’re not destroying existing records. You can always delete the old dataset afterwards, but by default, Prodigy is designed to always keep a record of each individual annotation decision – that’s also why it doesn’t just silently overwrite existing records in your dataset.


Is there some way to load annotated records to be visualized in prodigy? In order to ocasionally review annotators’ progress and conformity to the annotations specifications.


You could export a dataset as a JSONL file and then load that into ner.manual – the recipe will respect pre-set annotations, so you can review them and/or correct them if necessary. Alternatively, you could also write a script that uses db.get_examples to load the examples from a dataset, and outputs the dumped JSON so you can pipe it forward.

Depending on how sophisticated you want to process to be, you could even write a recipe that loads in a dataset, streams in the whole set (or a random selection of X examples), asks you to say yes/no to the annotations (either with the option to correct, or just a binary “is the whole thing correct?”) and then outputs a score as you exit.

I haven’t tested this yet, so consider this to be semi-pseudocode – but something like this could be cool:

import prodigy
from prodigy.components.db import connect
import random

    dataset=("Dataset name"),
    n_examples=("Number of examples to randomly review, -1 for all", "option", "n", int),
    manual=("Allow manual corrections", "option", "m", bool)
def review_annotations(dataset, n_examples=-1, manual=False):
    db = connect()
    examples = db.get_examples(dataset)
    if n_examples > 0:
        # get a random selection of examples
        examples = examples[:n_examples]

    # collect scores here
    scores = {'right': 0, 'wrong': 0}

    def update(examples):
        # get all accepted / rejected examples and update scores
        rejected = [eg for eg in examples if eg['answer'] == 'reject']
        accepted = [eg for eg in examples if eg['answer'] == 'accept']
        scores['wrong'] = scores['wrong'] + len(rejected)
        scores['right'] = scores['right'] + len(accepted)

    def on_exit(ctrl):
        # called when you exit the server, compile results
        total_right = scores['right']
        total_wrong = scores['wrong']
        total = len(examples)
        print('Reviewed dataset', dataset)
        print('Correct:', total_right, '%.2f' % total_right / total )
        print('Wrong:', total_wrong, '%.2f' % total_wrong / total )

    return {
        'dataset': False,  # don't save review annotations
        'stream': examples,
        'view_id': 'ner_manual' if manual else 'ner',
        'update': update,
        'on_exit': on_exit

Usage could then look something like:

prodigy review-annotations some_dataset -n 50 -F recipe.py

This would show you 50 annotations from the dataset to review, and will let you click through them to accept / reject whether the annotation is correct. When you exit the server, it will print the results, plus the percentage of correct and incorrect answers, respectively. So you can, for instance, immediately see that the annotator got 99% correct – or that they got 20% wrong, according to your review.

Great, thanks for the tip!

I’ll give it a try!