DB.drop_examples() raises ForeignKeyViolation

>>> DB.drop_examples(task_hashes)

ForeignKeyViolation: update or delete on table "example" violates foreign key constraint "link_example_id_fkey" on table "link"
DETAIL:  Key (id)=(48457) is still referenced from table "link".

I think the appropriate link records should be deleted automatically, either using CASCADE on the foreign key or explicitly.

Hi @simon.gurcke,

Thanks for the report! It looks like the issue is that our SQLite database adapter doesn't enforce foreignkey constraints by default, so we get different behaviors between adapters.

I'll look at enabling ForeignKey constraints in our SQLite adapter and fixing any ripple effects it has on our test suite. I'll also look at cascading deletes.

In the short-term, you can modify db.py in your installation path with an updated function to explicitly drop links:

    def drop_examples(self, ids, by="task_hash"):
        """
        ids (list): The IDs of the examples to drop.
        by (unicode): ID to get examples by. Defaults to 'task_hash'.
        """
        try:
            ids = list(ids)
        except TypeError:
            ids = [ids]
        field = getattr(Example, by)
        ids = list(ids)
        with self.db.atomic():
            example_ids = [e.id for e in Example.select(Example.id).where(field << ids)]
            Link.delete().where(Link.example << example_ids).execute()
            Example.delete().where(field << ids).execute()
            self.db.commit()

or if you prefer to avoid modifying db.py you can use this standalone snippet:

from prodigy.components.db import Database, Example, Link, connect


def drop_examples(database: Database, ids, by="task_hash"):
    """
    ids (list): The IDs of the examples to drop.
    by (unicode): ID to get examples by. Defaults to 'task_hash'.
    """
    try:
        ids = list(ids)
    except TypeError:
        ids = [ids]
    field = getattr(Example, by)
    ids = list(ids)
    with database.db.atomic():
        example_ids = [e.id for e in Example.select(Example.id).where(field << ids)]
        Link.delete().where(Link.example << example_ids).execute()
        Example.delete().where(field << ids).execute()
        database.db.commit()


adapter: Database = connect()
adapter.add_examples(
    [
        {"text": "1", "label": "a", "_input_hash": 1, "_task_hash": 1337},
        {"text": "2", "label": "a", "_input_hash": 2, "_task_hash": 1337},
        {"text": "3", "label": "a", "_input_hash": 3, "_task_hash": 1337},
    ],
    ("dataset",),
)
assert len(adapter.get_dataset("dataset")) == 3
drop_examples(adapter, [1337])
assert len(adapter.get_dataset("dataset")) == 0

1 Like