I have a situation where 3 annotators have annotated the same batch of 50 sentences using
These annotations have been reviewed and validated using
[Preformatted text](https://prodi.gy/docs/recipes#review) and saved to a new database.
A new annotator has annotated the same sentences, and the validator would like to review the new annotations with the annotations saved in the database. However, there is an error.
✘ Conflicting view_id values in datasets
Can't review annotations of 'ner_manual' (in dataset 'dummy_moderated') and
'review' (in previous examples)
Why is this? Given that the annotations have been reviewed, why can't the new annotations be reviewed against this dataset?
Thanks for the question!
The problem is the dataset that is the output for
review (let's call it
r1 -- you may have called it
dummy_moderated) nests the original annotations along with the reviewer's annotation. The original annotations have
view_id == 'ner_manual' while the reviewer annotations has
view_id == 'review'. This causes a conflict. You can notice this if you were to run only
prodigy review new_dataset r1.
One solution is a bit hacky but seemed to work for me. You can filter out only the reviewer's annotations from the
review dataset and rename its
view_id == 'ner_manual' so that the reviewed annotations have the same
view_id as your original annotations. Suppose you have five existing Prodigy datasets:
a1: first annotator 50 annotations
a2: second annotator 50 annotations
a3: third annotator 50 annotations
r1: reviewer annotations / output (dataset) from
prodigy review r1 a1,a2,a3
a4: fourth annotator 50 annotations
Get the review dataset (
db.get_dataset() and pass it to
clumper to filter, mutate, and export the file to
.jsonl. Alternatively, you could use
db.add_dataset() instead of exporting to the
from prodigy.components.db import connect
db = connect()
reviews = db.get_dataset("r1")
# pip install clumper
from clumper import Clumper
review_only = (Clumper(reviews)
.keep(lambda d: d['_view_id'] == 'review')
.mutate(_view_id=lambda d: 'ner_manual'))
Then create a new dataset based on
prodigy db-in r2 review-only.jsonl
And now you should be able to run:
python review r3 r2,a4
Let me know if this doesn't work. I can see the challenge in this and I've made a note. Perhaps there would be a way in the future to have an argument for
review to enable only the reviewed annotations from being output (along as the original
review). That would solve this problem. Thank you again for your question!
Thank you very much @ryanwesslen for the detailed response.
I had anticipated, and was exploring a hacky solution, but just wanted to check there wasn't an existing recipe to do what I wanted to do.
Your instruction is very clear and well laid out. Thank you for your efforts here.
I will update you on my progress.