Deleting examples from DB

I have an iterative workflow process where I need to delete examples that were rejected. Is it possible to easily delete those examples from the DB or should I do the following?

  1. db-out
  2. filter on accepted
  3. drop and create dataset
  4. db-in

Your approach seems fine, or you could write a custom recipe to do it:

from typing import List
import prodigy
from prodigy.components.db import Database, Dataset, Example, Link
from prodigy.util import log, print_stats

def basic_classification(dataset: str):
    DB: Database = prodigy.components.db.connect()
    if dataset not in DB:
        raise ValueError(f"dataset {dataset} does not exist!")
    dataset_id = Dataset.get( == dataset).id
    links: List[Link] = list( == dataset_id))
    to_delete: List[Link] = []
    for link in links:
        content = link.example.load()
        if content["answer"] == "reject":
    example_ids = [ for l in to_delete]
    link_ids = [ for l in to_delete]
    log(f"CLEANUP: Trashing {len(example_ids)} examples and {len(link_ids)} links")
    trash_examples = [l.example.load() for l in to_delete]
    trash_file = DB.add_to_trash(trash_examples, dataset)
    log(f"CLEANUP: Examples moved to trash: {trash_file}")
    Link.delete().where( << link_ids).execute()
    Example.delete().where( << example_ids).execute()
    log(f"CLEANUP: Examples and links removed from database")
        title="Trash rejected examples",
        stats={"Dataset": dataset, "Removed": len(example_ids), "Trash": trash_file},

Try it


{ "text": "1", "label": "TEST", "answer":"reject" }
{ "text": "2", "label": "TEST", "answer":"reject"  }
{ "text": "3", "label": "TEST" }

prodigy dataset test-dataset "test for removing rejected examples"
prodigy db-in test-dataset ./data.jsonl
prodigy cleanup test-dataset -F ./
prodigy stats test-dataset

The upside to a custom recipe is that the examples can be added to the prodigy trash before being removed, so you can recover them if need be:

  ✨  Trash rejected examples

Dataset   test-dataset                  
Removed   2                             
Trash     /yourpath/trash/test-dataset.jsonl

Ah yes of course. I keep forgetting to use recipes for more than just labelling tasks. Thanks @justindujardin


@nix411, just curious, what might be reason why you would want to delete ‘rejected’ examples? Thinking that a mixture of ‘accepts’ and ‘rejects’ are useful in model’s learning.


I am doing information extraction. Now I am verifying that the information being extracted is correct or not.

  • accept: use the extracted information to create unit tests for my application.
  • reject: implement a fix. Rerun the classification on the failing ones.

I hope it makes sense - maybe there is a better workflow though!?