I can not pass postgres db details as config dict

Hi,

I need to call prodigy from python boiled down to something like this:


@prodigy.recipe("custom_recipe")
def choice():
    return {
        "view_id": "choice",
        "dataset": "some_dataset",
        "stream": some_stream(),
    }

prodigy.serve("custom_recipe", **prodigy_config)

where prodigy_config looks like this:

prodigy_config = {
    "choice_style": "multiple",
    "db": 'postgresql',
    "db_settings": {
        "postgresql": {
            "host": "127.0.0.1",
            "dbname": "xxx",
            "user": "xxx",
            "password": "xxx",
            "port": 5432,
        }
    }
}

Running the recipe works, but it saves the examples into a isqlite db (✔ Saved 1 annotations to database SQLite), and not into the postgres db as defined by the config dict.

On the contrary if I pass it a db object from your connect function, then it works:

    from prodigy.components.db import connect

    db = connect(db_id='postgresql', db_settings=prodigy_config["db_settings"]["postgresql"])

    @prodigy.recipe("cats_recipe")
    def choice():
        return {
            "view_id": "choice",
            "dataset": "some_dataset",
            "stream": some_stream(),
            "db": db,
        }

    prodigy.serve("cats_recipe", **prodigy_config)

Running this it saves the example data and displays: ✔ Saved 1 annotations to database PostgreSQL

Do I do something wrong here? I obviously sticked to the doc https://prodi.gy/docs/api-database but couldn't find a mistake

Cheers,
Stefan

Could you check your prodigy.json and see if you have a value for "db" and "db_settings" in there? The global and local prodigy.json config overrides the defaults defined in the recipe, so maybe that's what's happening here?

Unfortunately, no prodigy.json is interfering. I confirmed by creating a new project with a directory only containing this python script with following content:

import prodigy


def get_cats():

    return [
        {"id": "option_a_id", "text": "option_a_value"},
        {"id": "option_b_id", "text": "option_b_value"},
        {"id": "option_c_id", "text": "option_c_value"},
    ]


def some_stream():

    cats = get_cats()

    yield {"text": "Some text 1", "options": cats}
    yield {"text": "Some text 2", "options": cats}
    yield {"text": "Some text 3", "options": cats}
    yield {"text": "Some text 4", "options": cats}


prodigy_config = {
    "choice_style": "multiple",
    "db": 'postgresql',
    "db_settings": {
        "postgresql": {
            "host": "127.0.0.1",
            "dbname": "xxx",
            "user": "xxx",
            "password": "xxx",
            "port": 5432,
        }
    }
}


@prodigy.recipe("custom_recipe")
def choice():
    return {
        "view_id": "choice",
        "dataset": "some_dataset",
        "stream": some_stream(),
    }

prodigy.serve("custom_recipe", **prodigy_config)

And a pip freeze shows that these packages are installed in my environment:

aiofiles==0.5.0
backcall==0.1.0
blis==0.4.1
cachetools==4.1.0
catalogue==1.0.0
certifi==2020.4.5.1
chardet==3.0.4
click==7.1.2
cymem==2.0.3
dataclasses==0.7
de-core-news-sm @ https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-2.2.5/de_core_news_sm-2.2.5.tar.gz
decorator==4.4.2
fastapi==0.44.1
h11==0.9.0
httptools==0.1.1
idna==2.9
importlib-metadata==1.6.0
ipython==7.15.0
ipython-genutils==0.2.0
jedi==0.17.0
murmurhash==1.0.2
numpy==1.18.4
pandas==1.0.4
parso==0.7.0
peewee==3.13.3
pexpect==4.8.0
pickleshare==0.7.5
pkg-resources==0.0.0
plac==1.1.3
preshed==3.0.2
prodigy @ file:///home/steff-vm/mara/root_mara_single_docker/libs/prodigy-1.9.9-cp36.cp37.cp38-cp36m.cp37m.cp38-linux_x86_64.whl
prompt-toolkit==3.0.5
psycopg2==2.8.5
ptyprocess==0.6.0
pydantic==1.5.1
Pygments==2.6.1
PyJWT==1.7.1
python-dateutil==2.8.1
pytz==2020.1
requests==2.23.0
six==1.15.0
spacy==2.2.4
srsly==1.0.2
starlette==0.12.9
thinc==7.4.0
toolz==0.10.0
tqdm==4.46.1
traitlets==4.3.3
urllib3==1.25.9
uvicorn==0.11.5
uvloop==0.14.0
wasabi==0.6.0
wcwidth==0.2.3
websockets==8.1
xlrd==1.2.0
xlwt==1.3.0
zipp==3.1.0

Again if I run this, the data would be saved to sqlite.

Looks like a bug on your side perhaps?

PRODIGY_LOGGING=verbose doesn't show more than if I would run it without this variable unfortunately.

Could the prodigy package itself store some settings somewhere else?

Ah, sorry for the confusion, I think I forgot that this currently isn't supposed to work, since Prodigy connects to the database right after the recipe is executed and doesn't currently use the recipe config.

Can you just initialize the db in the recipe instead? It should do the same thing and allow the same workflow.

I see. Thanks for the clarification, Ines. Yeah, sure, then I'll initialize the db object.

Cheers!

1 Like