Review recipe: Ignore for now, but go over later.

emiltj · January 19, 2023, 7:18am

Hi everyone,

I am in the midst of reviewing a number of annotated docs from 10 different raters. I am using the 'review recipe' in order to acquire a gold-standard dataset for NER. I am using the code below:

prodigy review gold-multi rater_1,rater_3,rater_4,rater_5,rater_6,rater_7,rater_8,rater_9 --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE -S -A

During the process, I have encountered texts where I had difficulty resolving them on my own. I have 'Ignored' these for now. However, now I would like to go over all the 'Ignored' texts, so that I may resolve them in discussion with my team and add them to my gold-standard dataset.

Is this possible?

I have tried flagging and ignoring these examples that I want to go over at a later stage, but I don't know how I would then go over the flagged and ignored examples and also adding them to the already reviewed dataset.

Thank you very much in advance

koaning · January 20, 2023, 3:25pm

You may want to use the flagging feature for this in the future. Flagged examples can be pulled from the database directly via the db-out command.

python -m prodigy db-out <dataset_name> --flagged-only

You may enjoy this Youtube video that gives a full demo:

That said, you can still fetch the examples that you're interested in, even if you didn't use the flagging feature. You will need to write a Python script to do that though. In particular, you'll want to use the get-dataset-examples function. It will probably look something like:

import srsly
from prodigy.components.db import connect

db = connect()
examples = db.get_dataset_examples("my_dataset_name")

ignored = [e for e in examples if e['answer'] == 'ignore']
srsly.write_jsonl("flagged.jsonl", ignored)

Note that this is also part of the beauty of Prodigy. The interface is programmatic so you're free to select/dice/slice the annotated data however you see fit.

emiltj · January 21, 2023, 10:20am

Hi Vincent,

Thank you very much for reply.
Perhaps I didn't specify clearly, but I was looking for a way to go over and re-review the ignored/flagged cases. I.e. I already have already retrieved all the ignored instances into a dataset in a similar manner to the one you described.

I was looking for the way in which I may re-review the ignored cases and subsequently add it to the other dataset. The review recipe seems to only let you specify multiple datasets, as opposed to running it on a single, already-reviewed dataset.

However, luckily I found a solution after some additional searching around. For anyone who might come across the post, this is the solution:

Python:

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("gold-multi-all")

accepted = [e for e in examples if e["answer"] == "accept"]
db.add_examples(accepted, ["gold-multi-accepted"])

ignored = [e for e in examples if e["answer"] == "ignore"]
db.add_examples(ignored, ["gold-multi-ignored"])

CLI:

prodigy mark gold_multi_ignored_resolved dataset:gold-multi-ignored --view-id review

prodigy db-merge gold-multi-accepted,gold_multi_ignored_resolved gold-multi

Topic		Replies	Views
Reviewing Ignored Cases enhancement , usage , textcat , done , review	14	1260	July 28, 2023
Exporting not_flagged annotation usage , database , solved	5	659	March 25, 2019
NER review datasets with partial overlap while keeping all texts usage , ner , best-practices , review	7	588	February 20, 2023
Review my previous reviews usage , ner , solved , review	4	1075	May 24, 2021
Skip Functionality usage	3	540	September 28, 2022

Review recipe: Ignore for now, but go over later.

Related topics