I am in the midst of reviewing a number of annotated docs from 10 different raters. I am using the 'review recipe' in order to acquire a gold-standard dataset for NER. I am using the code below:
prodigy review gold-multi rater_1,rater_3,rater_4,rater_5,rater_6,rater_7,rater_8,rater_9 --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE -S -A
During the process, I have encountered texts where I had difficulty resolving them on my own. I have 'Ignored' these for now. However, now I would like to go over all the 'Ignored' texts, so that I may resolve them in discussion with my team and add them to my gold-standard dataset.
Is this possible?
I have tried flagging and ignoring these examples that I want to go over at a later stage, but I don't know how I would then go over the flagged and ignored examples and also adding them to the already reviewed dataset.
You may enjoy this Youtube video that gives a full demo:
That said, you can still fetch the examples that you're interested in, even if you didn't use the flagging feature. You will need to write a Python script to do that though. In particular, you'll want to use the get-dataset-examples function. It will probably look something like:
import srsly
from prodigy.components.db import connect
db = connect()
examples = db.get_dataset_examples("my_dataset_name")
ignored = [e for e in examples if e['answer'] == 'ignore']
srsly.write_jsonl("flagged.jsonl", ignored)
Note that this is also part of the beauty of Prodigy. The interface is programmatic so you're free to select/dice/slice the annotated data however you see fit.
Thank you very much for reply.
Perhaps I didn't specify clearly, but I was looking for a way to go over and re-review the ignored/flagged cases. I.e. I already have already retrieved all the ignored instances into a dataset in a similar manner to the one you described.
I was looking for the way in which I may re-review the ignored cases and subsequently add it to the other dataset. The review recipe seems to only let you specify multiple datasets, as opposed to running it on a single, already-reviewed dataset.
However, luckily I found a solution after some additional searching around. For anyone who might come across the post, this is the solution:
Python:
from prodigy.components.db import connect
db = connect()
examples = db.get_dataset("gold-multi-all")
accepted = [e for e in examples if e["answer"] == "accept"]
db.add_examples(accepted, ["gold-multi-accepted"])
ignored = [e for e in examples if e["answer"] == "ignore"]
db.add_examples(ignored, ["gold-multi-ignored"])
CLI:
prodigy mark gold_multi_ignored_resolved dataset:gold-multi-ignored --view-id review