You could export a dataset as a JSONL file and then load that into ner.manual
– the recipe will respect pre-set annotations, so you can review them and/or correct them if necessary. Alternatively, you could also write a script that uses db.get_examples
to load the examples from a dataset, and outputs the dumped JSON so you can pipe it forward.
Depending on how sophisticated you want to process to be, you could even write a recipe that loads in a dataset, streams in the whole set (or a random selection of X examples), asks you to say yes/no to the annotations (either with the option to correct, or just a binary “is the whole thing correct?”) and then outputs a score as you exit.
I haven’t tested this yet, so consider this to be semi-pseudocode – but something like this could be cool:
import prodigy
from prodigy.components.db import connect
import random
@prodigy.recipe('review-annotations',
dataset=("Dataset name"),
n_examples=("Number of examples to randomly review, -1 for all", "option", "n", int),
manual=("Allow manual corrections", "option", "m", bool)
)
def review_annotations(dataset, n_examples=-1, manual=False):
db = connect()
examples = db.get_examples(dataset)
if n_examples > 0:
# get a random selection of examples
random.shuffle(examples)
examples = examples[:n_examples]
# collect scores here
scores = {'right': 0, 'wrong': 0}
def update(examples):
# get all accepted / rejected examples and update scores
rejected = [eg for eg in examples if eg['answer'] == 'reject']
accepted = [eg for eg in examples if eg['answer'] == 'accept']
scores['wrong'] = scores['wrong'] + len(rejected)
scores['right'] = scores['right'] + len(accepted)
def on_exit(ctrl):
# called when you exit the server, compile results
total_right = scores['right']
total_wrong = scores['wrong']
total = len(examples)
print('Reviewed dataset', dataset)
print('Correct:', total_right, '%.2f' % total_right / total )
print('Wrong:', total_wrong, '%.2f' % total_wrong / total )
return {
'dataset': False, # don't save review annotations
'stream': examples,
'view_id': 'ner_manual' if manual else 'ner',
'update': update,
'on_exit': on_exit
}
Usage could then look something like:
prodigy review-annotations some_dataset -n 50 -F recipe.py
This would show you 50 annotations from the dataset to review, and will let you click through them to accept / reject whether the annotation is correct. When you exit the server, it will print the results, plus the percentage of correct and incorrect answers, respectively. So you can, for instance, immediately see that the annotator got 99% correct – or that they got 20% wrong, according to your review.