Review recipe: auto accept identical annotations

psimm · June 1, 2021, 1:52pm

Hi, I would like to ask for advice on the review recipe.

I'm at the beginning of a large annotation project (20.000 examples) with a team of annotators. We plan to annotate using ner.manual and annotate each example twice by using a group_a and group_b session ID.

After an example has been annotated by both groups, I run the review recipe.

Is it possible to auto-accept (and add to review dataset) all annotations that were annotated identically by the two groups? That would save clicking through the 90% of examples where that was the case.
I'll review throughout the process. Is it possible to let review only show the examples that already have 2 annotations (not just 1)?

What is the easiest way to do that?

I checked the custom recipes Github repo and read the review.py file, but am a bit lost with how I would go about modifying the recipe.

Any hints would be much appreciated

Best regards
Paul

ines · June 2, 2021, 2:29am

Hi! This is actually something I've had on my list of enhancements for the built-in workflow, because I think it'd be a nice addition

In terms of the implementation, you can think of it like this: if you're using a manual workflow like ner.manual, Prodigy's review workflow will group together all examples with the same input hash (= the same text). The different "versions" that you see in the UI are grouped by task hash (= same annotations). In the JSON data generated by the recipe, the versions are stored as "versions" and each version has a list of "sessions" (dataset or annotation session that created this annotation).

So in your case, you'd want to auto-save all examples with only one version (no conflicts) straight to the database – except for those that only have one session (only annotated by one person), which should be skipped and not annotated for now. In code, it could look like this (untested but should work):

def filter_review_stream(stream, dataset):
    db = connect()
    for eg in stream:
        versions = eg["versions"]
        if len(versions) == 1:  # no conflicts, only one version
            sessions = versions[0]["sessions"]
            if len(sessions) > 1:  # multiple identical versions
                # Add example to dataset automatically
                eg["answer"] = "accept" 
                db.add_examples([eg], [dataset])
            # don't send for annotation
        else:
            yield eg  # send out for annotation

You can call that wrapper in your recipe right before it returns the components. If you just want to hack around, you can also run prodigy stats to find the location of your Prodigy installation and edit recipes/review.py directly.

psimm · June 2, 2021, 8:12am

That's awesome, thank you for the explanation and the code.

I'm unsure how to use that in the recipe function. I put your function into review.py. Then in the review function, I called it:

filtered_stream = filter_review_stream(stream, dataset)

return {
     "view_id": "review",
     "dataset": dataset,
     "stream": filtered_stream,
     "before_db": before_db,
     "config": config,
 }

but that causes an error: "line 240, in filter_review_stream
sessions = versions["sessions"]
TypeError: list indices must be integers or slices, not str".

ines · June 2, 2021, 9:55am

Yes, that looks correct!

Ah, sorry, I think this is just a typo and should be versions[0]["sessions"], since we're looking at the sessions of the one (and only) version here.

psimm · June 2, 2021, 1:55pm

It works! Thanks again, this saved me from going through thousands of identical annotations.

ines · August 12, 2021, 11:41am

Just released Prodigy v1.11, which introduces an --auto-accept flag on the review recipe. This implements pretty much the exact approach I outlined above, out-of-the-box https://prodi.gy/docs/recipes#review

psimm · August 12, 2021, 1:41pm

That's awesome! I'll upgrade asap
Thanks for making v1.11 such a great update. spaCy v3 was a giant step and I'm so glad that I can use it with Prodigy now.

Topic		Replies	Views
Prodigy review recipe not entirely clear to me	8	624	June 22, 2023
prodigy review --auto-accept exhausting stream before all annotations saved to gold dataset review , streams	10	919	January 27, 2023
Review dataset with multiple input hashes usage , best-practices , review	6	908	June 8, 2021
NER review datasets with partial overlap while keeping all texts usage , ner , best-practices , review	7	588	February 20, 2023
Review recipe: which examples does it show? usage , review	1	832	June 13, 2019

Review recipe: auto accept identical annotations

Related topics