Configure a non-sqlite database (e.g. postgres) without storing the password in prodigy.json

Sure, that’s no problem! What’s your preferred way of handling passwords? Environment variables?

Another thing you could do, which gives you even more flexibility, is to connect to the database in Python. Prodigy exposes a connect function which takes the database type and a dictionary of database settings as its arguments:

from prodigy.components.db import connect
db = connect('postgresql', {'dbname': 'xxx', 'user': 'xxx', 'password': 'xxx'})

You can pass a custom DB to Prodigy as the 'db' key of the components dictionary returned by a recipe – either one created by the connect() function, or an entirely custom one that follows the same API (see the readme for the API documentation).

If you’re using one of Prodigy’s built-in recipes, you can also wrap it in a custom recipe and just overwrite the database (or execute any other code you like). Recipes are just simple Python functions that return a dictionary of components – so you can import an existing one, pass in the arguments, execute it, receive back the recipe components, overwrite the DB and return it by your custom recipe:

import prodigy
from prodigy.recipes.ner import teach  # import the built-in ner.teach recipe

@prodigy.recipe('ner.teach.wrapper')
def ner_teach_wrapper(dataset, model, source, label=None):
    # pass in the arguments of ner.teach and get back the recipe components
    components = teach(dataset, model, source=source, label=label)
    # this will return a dict like {'dataset': dataset, 'stream': stream} etc.
    components['db'] = db   # overwrite the database with the one you created
    return components  # return the recipe components

Then you can run your recipe just like ner.teach:

prodigy ner.teach.wrapper my_dataset en_core_web_sm my_data.jsonl -F recipe.py

You can read more on this in the PRODIGY_README.

1 Like