I’m connecting Prodigy to an existing Postgres cluster, and I’d like to be able to access the annotations created by users in other parts of our system, ideally in conjunction with other tables that already exist for joins, etc. Currently, the table names used by prodigy (dataset, example, link as I understand) are not yet used, but to be on the safe side I’d like to give Prodigy its own namespace, or somehow add a prefix to the table names (in case we ever need those table names to mean something else). Is there any way to do this?
The source of the database model (using peewee
) is included with Prodigy in components/db.py
, so you can check out how it is set up there. For 100% flexibility, you can also implement your own custom Database
class – see here for more details:
As long as the database methods exist and do the right thing, it's then fully up to you how you implement the logic and how the data is stored or retrieved.
Thanks @ines - I ended up finding a simple solution, it’s possible to set a default schema using the search_path setting:
_db = PostgresqlExtDatabase(
params["database"],
user=params["user"],
password=params["password"],
host=params["host"],
options='-c search_path="prodigy"' # use schema to separate data from core app
)
Tables are then created as prodigy.dataset
, etc.
Hi @phdowling
Where do you get params
from and how do you make sure that you use _db
and how does your prodigy.json
look like?
I'm having this
from playhouse.postgres_ext import PostgresqlExtDatabase
from prodigy.components.db import Database
class CustomDatabase(Database):
def __init__(self, display_id="custom", display_name=None):
super().__init__(
PostgresqlExtDatabase(
"postgres",
user="postgres",
password="pass",
host="localhost",
options='-c search_path="prodigy"',
),
display_id,
display_name,
)
and point to CustomDatabase
in prodigy_db
entry point. Then I have my prodigy.json
db to point to that entry point but I get the following error
AttributeError: type object 'CustomDatabase' has no attribute 'db_name'
which is odd since I extend Database
from prodigy.components.db
Turns out I have to point to an instance of Database
instead. So the following works by having entry point to db
object
from playhouse.postgres_ext import PostgresqlExtDatabase
from prodigy.components.db import Database
_db = PostgresqlExtDatabase(
"postgres",
user="postgres",
password="pass",
host="localhost"
)
db = Database(_db, "db", "DB")
However if I add the following to PostgresqlExtDatabase
options='-c search_path="prodigy"',
then I get an error when running prodigy
peewee.ProgrammingError: no schema has been selected to create in
LINE 1: CREATE TABLE IF NOT EXISTS "example" ("id" SERIAL NOT NULL P.
I've created a prodigy
schema in the database that I connect to...
For anyone that is still having an issue setting the schema. You can set it in the prodigy.json file using the options parameter per the comments above. However, because it's JSON, you need to use double quotes instead of single and unquote the schema name.
Change
'options='-c search_path="prodigy"'
to
"options": "-c search_path=prodigy"
Your JSON file should look like this.
{
"db": "postgresql",
"db_settings": {
"postgresql": {
"dbname": "YOUR_DB_NAME",
"user": "YOUR_USERNAME",
"password": "YOUR_PASSWORD",
"host": "YOUR_HOST",
"port": "YOUR_PORT",
"options": "-c search_path=YOUR_SCHEMA"
}
}
}