Hi, I would like to ask for advice on the review recipe.
I'm at the beginning of a large annotation project (20.000 examples) with a team of annotators. We plan to annotate using ner.manual and annotate each example twice by using a group_a and group_b session ID.
After an example has been annotated by both groups, I run the review recipe.
Is it possible to auto-accept (and add to review dataset) all annotations that were annotated identically by the two groups? That would save clicking through the 90% of examples where that was the case.
I'll review throughout the process. Is it possible to let review only show the examples that already have 2 annotations (not just 1)?
What is the easiest way to do that?
I checked the custom recipes Github repo and read the review.py file, but am a bit lost with how I would go about modifying the recipe.
Hi! This is actually something I've had on my list of enhancements for the built-in workflow, because I think it'd be a nice addition
In terms of the implementation, you can think of it like this: if you're using a manual workflow like ner.manual, Prodigy's review workflow will group together all examples with the same input hash (= the same text). The different "versions" that you see in the UI are grouped by task hash (= same annotations). In the JSON data generated by the recipe, the versions are stored as "versions" and each version has a list of "sessions" (dataset or annotation session that created this annotation).
So in your case, you'd want to auto-save all examples with only one version (no conflicts) straight to the database – except for those that only have one session (only annotated by one person), which should be skipped and not annotated for now. In code, it could look like this (untested but should work):
def filter_review_stream(stream, dataset):
db = connect()
for eg in stream:
versions = eg["versions"]
if len(versions) == 1: # no conflicts, only one version
sessions = versions[0]["sessions"]
if len(sessions) > 1: # multiple identical versions
# Add example to dataset automatically
eg["answer"] = "accept"
db.add_examples([eg], [dataset])
# don't send for annotation
else:
yield eg # send out for annotation
You can call that wrapper in your recipe right before it returns the components. If you just want to hack around, you can also run prodigy stats to find the location of your Prodigy installation and edit recipes/review.py directly.
but that causes an error: "line 240, in filter_review_stream
sessions = versions["sessions"]
TypeError: list indices must be integers or slices, not str".
Just released Prodigy v1.11, which introduces an --auto-accept flag on the review recipe. This implements pretty much the exact approach I outlined above, out-of-the-box https://prodi.gy/docs/recipes#review
That's awesome! I'll upgrade asap
Thanks for making v1.11 such a great update. spaCy v3 was a giant step and I'm so glad that I can use it with Prodigy now.