batch_train with custom database connection

I would like to define my database connection in code, so I can read the connection details from an encrypted file. I know I can specify the database details in prodigy.json, but I don't like to have sensitive information in there unencrypted, given that it will be committed to version control.

With the teach recipe I could just overwrite the db component, but how do I do the same when calling batch_train?

Hi, cant you just apply the method described in this Gist (this is also posted on the forum). And use the prodigy.json.tpl to set up you DB connection via env vars. Which is kind of best practice for handling creds in version control.


Thanks, that may be a workaround. However I'm working in a conda environment rather than a Docker container. Maybe there is an easier way?

Maybe a solution would be to use dotEnv and apply the prodigy.json.tpl trick? :slight_smile:

Another option could be to edit the recipe source and replace DB = connect() with your custom database. The DB here is expected to be a regular Prodigy Database class.

If you want things to be really elegant, you could also wrap your custom database class in a Python package and expose it via the prodigy_db entry point group. You can find more details about this in the "Entry Points" section of your PRODIGY_README.html. All your prodigy.json would then need to do is specify "db": "your_custom_db" and the entry point will tell Prodigy how to resolve that name and what database to initialize.

Thanks @ines, the entry point solution is very elegant indeed and works perfectly.

1 Like