Hello,
I have a situation where 3 annotators have annotated the same batch of 50 sentences using ner.manual
.
These annotations have been reviewed and validated using [Preformatted text](https://prodi.gy/docs/recipes#review)
and saved to a new database.
A new annotator has annotated the same sentences, and the validator would like to review the new annotations with the annotations saved in the database. However, there is an error.
✘ Conflicting view_id values in datasets
Can't review annotations of 'ner_manual' (in dataset 'dummy_moderated') and
'review' (in previous examples)
Why is this? Given that the annotations have been reviewed, why can't the new annotations be reviewed against this dataset?
1 Like
Hi @rory-hurley-gds!
Thanks for the question!
The problem is the dataset that is the output for review
(let's call it r1
-- you may have called it dummy_moderated
) nests the original annotations along with the reviewer's annotation. The original annotations have view_id == 'ner_manual'
while the reviewer annotations has view_id == 'review'
. This causes a conflict. You can notice this if you were to run only r1
as prodigy review new_dataset r1
.
One solution is a bit hacky but seemed to work for me. You can filter out only the reviewer's annotations from the review
dataset and rename its view_id == 'ner_manual'
so that the reviewed annotations have the same view_id
as your original annotations. Suppose you have five existing Prodigy datasets:
a1
: first annotator 50 annotations
a2
: second annotator 50 annotations
a3
: third annotator 50 annotations
r1
: reviewer annotations / output (dataset) from prodigy review r1 a1,a2,a3
a4
: fourth annotator 50 annotations
Get the review dataset (r1
) via db.get_dataset()
and pass it to clumper
to filter, mutate, and export the file to .jsonl
. Alternatively, you could use db.add_dataset()
instead of exporting to the.jsonl
file.
from prodigy.components.db import connect
db = connect()
reviews = db.get_dataset("r1")
# pip install clumper
from clumper import Clumper
review_only = (Clumper(reviews)
.keep(lambda d: d['_view_id'] == 'review')
.mutate(_view_id=lambda d: 'ner_manual'))
review_only.write_jsonl("review-only.jsonl")
Then create a new dataset based on review_only.jsonl
:
prodigy db-in r2 review-only.jsonl
And now you should be able to run:
python review r3 r2,a4
Let me know if this doesn't work. I can see the challenge in this and I've made a note. Perhaps there would be a way in the future to have an argument for review
to enable only the reviewed annotations from being output (along as the original view_id
, not review
). That would solve this problem. Thank you again for your question!
Thank you very much @ryanwesslen for the detailed response.
I had anticipated, and was exploring a hacky solution, but just wanted to check there wasn't an existing recipe to do what I wanted to do.
Your instruction is very clear and well laid out. Thank you for your efforts here.
I will update you on my progress.
Thanks,
Rory
1 Like