What DB privileges does prodigy use?

Prodigy uses the peewee package to manage the database integration, which should hopefully give you a lot of flexibility in terms of setup and debugging. (It also allows custom configuration, in case you need it.)

If I’m not mistaken, Prodigy only uses the basic operations, so once the tables are created, you should be able to get by with SELECT, INSERT, UPDATE and DELETE. You can probably even leave out DELETE, since it’s only ever used if you run the prodigy drop command to delete datasets. Prodigy needs the tables Dataset, Example and Link (and will try to CREATE them if they don’t exist).

In your PRODIGY_README.html, you can find an overview of the available database methods. So in order to test the database connection, you can also write a little script that performs the most important operations:

from prodigy.components.db import connect

db = connect()  # uses the settings in your prodigy.json
db.add_dataset('test_dataset')  # add a dataset
assert len(db) == 1
assert len(db.datasets) == 1
assert 'test_dataset' in db
print(db.datasets)  # ['test_dataset']

examples = [{'text': 'hello world', '_task_hash': 123, '_input_hash': 456}]
db.add_examples(examples, ['test_dataset'])  # add examples to the dataset
dataset = db.get_dataset('test_dataset')  # retrieve a dataset
assert len(dataset) == 1

Btw, if you’re using the MySQLdb driver, you might have to use "name", "dbname" or "database"(instead of "db") to specify the database name in your prodigy.json. See this thread for more details – this will be fixed in the next release.

1 Like