Is it possible to change the database schema or table names used by Prodigy?

I’m connecting Prodigy to an existing Postgres cluster, and I’d like to be able to access the annotations created by users in other parts of our system, ideally in conjunction with other tables that already exist for joins, etc. Currently, the table names used by prodigy (dataset, example, link as I understand) are not yet used, but to be on the safe side I’d like to give Prodigy its own namespace, or somehow add a prefix to the table names (in case we ever need those table names to mean something else). Is there any way to do this?

The source of the database model (using peewee) is included with Prodigy in components/db.py, so you can check out how it is set up there. For 100% flexibility, you can also implement your own custom Database class – see here for more details:

As long as the database methods exist and do the right thing, it's then fully up to you how you implement the logic and how the data is stored or retrieved.

Thanks @ines - I ended up finding a simple solution, it’s possible to set a default schema using the search_path setting:

_db = PostgresqlExtDatabase(
        params["database"],
        user=params["user"],
        password=params["password"],
        host=params["host"],
        options='-c search_path="prodigy"'  # use schema to separate data from core app
    )

Tables are then created as prodigy.dataset, etc.

1 Like

Hi @phdowling

Where do you get params from and how do you make sure that you use _db and how does your prodigy.json look like?

I'm having this

from playhouse.postgres_ext import PostgresqlExtDatabase
from prodigy.components.db import Database


class CustomDatabase(Database):
    def __init__(self, display_id="custom", display_name=None):
        super().__init__(
            PostgresqlExtDatabase(
                "postgres",
                user="postgres",
                password="pass",
                host="localhost",
                options='-c search_path="prodigy"',
            ),
            display_id,
            display_name,
        )

and point to CustomDatabase in prodigy_db entry point. Then I have my prodigy.json db to point to that entry point but I get the following error

AttributeError: type object 'CustomDatabase' has no attribute 'db_name'

which is odd since I extend Database from prodigy.components.db

Turns out I have to point to an instance of Database instead. So the following works by having entry point to db object

from playhouse.postgres_ext import PostgresqlExtDatabase
from prodigy.components.db import Database

_db = PostgresqlExtDatabase(
    "postgres",
    user="postgres",
    password="pass",
    host="localhost"
)
db = Database(_db, "db", "DB")

However if I add the following to PostgresqlExtDatabase

    options='-c search_path="prodigy"',

then I get an error when running prodigy

peewee.ProgrammingError: no schema has been selected to create in
LINE 1: CREATE TABLE IF NOT EXISTS "example" ("id" SERIAL NOT NULL P.

I've created a prodigy schema in the database that I connect to...

For anyone that is still having an issue setting the schema. You can set it in the prodigy.json file using the options parameter per the comments above. However, because it's JSON, you need to use double quotes instead of single and unquote the schema name.

Change

'options='-c search_path="prodigy"'

to

"options": "-c search_path=prodigy"

Your JSON file should look like this.

{
    "db": "postgresql",
    "db_settings": {
        "postgresql": {
            "dbname": "YOUR_DB_NAME",
            "user": "YOUR_USERNAME",
            "password": "YOUR_PASSWORD",
            "host": "YOUR_HOST",
            "port": "YOUR_PORT",
            "options": "-c search_path=YOUR_SCHEMA"
        }
    }
}
1 Like