Review recipe: Ignore for now, but go over later.

Hi everyone,

I am in the midst of reviewing a number of annotated docs from 10 different raters. I am using the 'review recipe' in order to acquire a gold-standard dataset for NER. I am using the code below:

prodigy review gold-multi rater_1,rater_3,rater_4,rater_5,rater_6,rater_7,rater_8,rater_9 --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE -S -A

During the process, I have encountered texts where I had difficulty resolving them on my own. I have 'Ignored' these for now. However, now I would like to go over all the 'Ignored' texts, so that I may resolve them in discussion with my team and add them to my gold-standard dataset.

Is this possible?

I have tried flagging and ignoring these examples that I want to go over at a later stage, but I don't know how I would then go over the flagged and ignored examples and also adding them to the already reviewed dataset.

Thank you very much in advance

You may want to use the flagging feature for this in the future. Flagged examples can be pulled from the database directly via the db-out command.

python -m prodigy db-out <dataset_name> --flagged-only

You may enjoy this Youtube video that gives a full demo:

That said, you can still fetch the examples that you're interested in, even if you didn't use the flagging feature. You will need to write a Python script to do that though. In particular, you'll want to use the get-dataset-examples function. It will probably look something like:

import srsly
from prodigy.components.db import connect

db = connect()
examples = db.get_dataset_examples("my_dataset_name")

ignored = [e for e in examples if e['answer'] == 'ignore']
srsly.write_jsonl("flagged.jsonl", ignored)

Note that this is also part of the beauty of Prodigy. The interface is programmatic so you're free to select/dice/slice the annotated data however you see fit.

Hi Vincent,

Thank you very much for reply.
Perhaps I didn't specify clearly, but I was looking for a way to go over and re-review the ignored/flagged cases. I.e. I already have already retrieved all the ignored instances into a dataset in a similar manner to the one you described.

I was looking for the way in which I may re-review the ignored cases and subsequently add it to the other dataset. The review recipe seems to only let you specify multiple datasets, as opposed to running it on a single, already-reviewed dataset.

However, luckily I found a solution after some additional searching around. For anyone who might come across the post, this is the solution:


from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("gold-multi-all")

accepted = [e for e in examples if e["answer"] == "accept"]
db.add_examples(accepted, ["gold-multi-accepted"])

ignored = [e for e in examples if e["answer"] == "ignore"]
db.add_examples(ignored, ["gold-multi-ignored"])


prodigy mark gold_multi_ignored_resolved dataset:gold-multi-ignored --view-id review
prodigy db-merge gold-multi-accepted,gold_multi_ignored_resolved gold-multi
