Deleting examples from DB

I have an iterative workflow process where I need to delete examples that were rejected. Is it possible to easily delete those examples from the DB or should I do the following?

  1. db-out
  2. filter on accepted
  3. drop and create dataset
  4. db-in

Your approach seems fine, or you could write a custom recipe to do it:

from typing import List, Dict
import prodigy
from prodigy.components.db import Database, Dataset, Example, Link
from prodigy.util import log, print_stats

def remove_rejected_examples(dataset: str):
    DB: Database = prodigy.components.db.connect()
    if dataset not in DB:
        raise ValueError(f"dataset {dataset} does not exist!")
    dataset_id = Dataset.get( == dataset).id
    links = == dataset_id)
    to_delete: List[Link] = []
    invalid_link_ids: List[int] = []
    for link in links:
            content = link.example.load()
            if content["answer"] == "reject":
        except Example.DoesNotExist:
            # If we find a broken link, remove it

    # Grab ALL the links for the examples we want to remove, and
    # see how many references there are to each example. If there
    # are only two, we'll remove the example along with its links.
    links = << [ for l in to_delete])
    link_counts: Dict[str, int] = {}
    for link in links:
        key = link.example_id
        if key not in link_counts:
            link_counts[key] = 0
        link_counts[key] += 1
    # If there are two or fewer links to this example it's okay to remove it.
    link_example_ids = [k for k, v in link_counts.items() if v <= 2]
    example_links = << link_example_ids)
    all_links = [ for l in example_links] + invalid_link_ids
    Link.delete().where( << all_links).execute()
    to_delete_example_ids = list(set([ for l in example_links]))
        f"CLEANUP: Trashing {len(to_delete_example_ids)} examples and {len(all_links)} links"
    to_delete_examples = << to_delete_example_ids)
    trash_examples = [ex.load() for ex in to_delete_examples]
    trash_file = DB.add_to_trash(trash_examples, dataset)
    log(f"CLEANUP: Examples moved to trash: {trash_file}")
    Example.delete().where( << to_delete_example_ids).execute()
    log(f"CLEANUP: Examples and links removed from database")
        title="Trash rejected examples",
            "Dataset": dataset,
            "Removed": len(link_example_ids),
            "Trash": trash_file,

Try it


{ "text": "1", "label": "TEST", "answer":"reject" }
{ "text": "2", "label": "TEST", "answer":"reject"  }
{ "text": "3", "label": "TEST" }

prodigy dataset test-dataset "test for removing rejected examples"
prodigy db-in test-dataset ./data.jsonl
prodigy cleanup test-dataset -F ./
prodigy stats test-dataset

The upside to a custom recipe is that the examples can be added to the prodigy trash before being removed, so you can recover them if need be:

  ✨  Trash rejected examples

Dataset   test-dataset                  
Removed   2                             
Trash     /yourpath/trash/test-dataset.jsonl

Ah yes of course. I keep forgetting to use recipes for more than just labelling tasks. Thanks @justindujardin


@nix411, just curious, what might be reason why you would want to delete ‘rejected’ examples? Thinking that a mixture of ‘accepts’ and ‘rejects’ are useful in model’s learning.


I am doing information extraction. Now I am verifying that the information being extracted is correct or not.

  • accept: use the extracted information to create unit tests for my application.
  • reject: implement a fix. Rerun the classification on the failing ones.

I hope it makes sense - maybe there is a better workflow though!?

@justindujardin Hi Justin, I tried your scrip to modify the database in place and it works perfect, in the SQLlite database, I'm changing the default background db for a PostgreSQL one and I'm geeting on this line:

Example.delete().where( << example_ids).execute()

peewee.IntegrityError: update or delete on table "example" violates foreign key constraint "link_example_id_fkey" on table "link"
DETAIL: Key (id)=(435954) is still referenced from table "link".

Any idea on how to sort this around,
Best regards

Hi @AlejandroJCR,

The problem is that PostgreSQL enforces foreign key constraints and SQLite does not (by default) I updated the snippet above to look for and remove links for rejected example sessions as well. Can you try the updated version and confirm it works?

Hello Dustin, thank you for your quick response and update, although I still getting the same error, just in case you may ask: the way I connect to the PostgreSQL it's the standart one.

Okay, I reproduced the error (in SQLite by enabling ForeignKey constraints) and fixed it. Please try again

I tried now and it works very well. Thanks you very much Dustin.