What DB privileges does prodigy use?

I’m in the process of setting up a MySQL DB for prodigy. I’m planning on using the Text Classification workflow for now. Is there any documentation of the DB privileges needed for this task? I’d prefer not to make a user with ALL privileges since I’m storing sensitive data for this task. I’m guessing at least INSERT and SELECT are needed? Anything else? Here’s the full list of privileges for MySQL: https://dev.mysql.com/doc/refman/5.5/en/privileges-provided.html

I’m okay with trial and error but if anyone knows the answer it would be really helpful. Thanks!

1 Like

Prodigy uses the peewee package to manage the database integration, which should hopefully give you a lot of flexibility in terms of setup and debugging. (It also allows custom configuration, in case you need it.)

If I’m not mistaken, Prodigy only uses the basic operations, so once the tables are created, you should be able to get by with SELECT, INSERT, UPDATE and DELETE. You can probably even leave out DELETE, since it’s only ever used if you run the prodigy drop command to delete datasets. Prodigy needs the tables Dataset, Example and Link (and will try to CREATE them if they don’t exist).

In your PRODIGY_README.html, you can find an overview of the available database methods. So in order to test the database connection, you can also write a little script that performs the most important operations:

from prodigy.components.db import connect

db = connect()  # uses the settings in your prodigy.json
db.add_dataset('test_dataset')  # add a dataset
assert len(db) == 1
assert len(db.datasets) == 1
assert 'test_dataset' in db
print(db.datasets)  # ['test_dataset']

examples = [{'text': 'hello world', '_task_hash': 123, '_input_hash': 456}]
db.add_examples(examples, ['test_dataset'])  # add examples to the dataset
dataset = db.get_dataset('test_dataset')  # retrieve a dataset
assert len(dataset) == 1

Btw, if you’re using the MySQLdb driver, you might have to use "name", "dbname" or "database"(instead of "db") to specify the database name in your prodigy.json. See this thread for more details – this will be fixed in the next release.

1 Like

Worked perfectly. Thanks!

1 Like

Thanks a lot for updating – I’ll also add a note about this to the docs. I’m sure this might be relevant to other users as well :grinning:

Prodigy drop gives invalid syntax when run through python script.
What is the correct syntax?

This function gives assertion error.Why?

To remove a dataset, you can use the db.drop_dataset method with the name of the dataset – see your PRODIGY_README.html for details on the API.

And which assert statement causes the assertion error? If one of those statements fail, it means that the condition does not evaluate to True – for example, if assert len(db.datasets) == 1 fails, it means that you don’t have 1 dataset in your database. (Note that the above code is only an example showing how to write tests for the database – you’ll obviously have to adjust it for whatever you’re trying to do.)

1 Like

Got it.Really helpful.Thanks a lot.