How do tables map to datasets in prodigy DB?

I just purchased Prodigy and am working through the documentation. I ran the simple DB test script (found in the readme file) that tests out the database. I used all defaults so it created prodigy.db in my default home folder ~/.prodigy. The script is:

from prodigy.components.db import connect
db = connect()
db.add_dataset('test_dataset')
assert 'test_dataset' in db
examples = [{'text': 'hello world', '_task_hash': 123, '_input_hash': 456}]
db.add_examples(examples, ['test_dataset'])
dataset = db.get_dataset('test_dataset')
assert len(dataset) == 1

The first time I run this it works fine. The second time I run this it fails because I think the dataset now has length 2. But when I use the sqlite3 command line tool to see what is in prodigy.db, I use the .dump command and all I see is this:

sqlite> .dump

PRAGMA foreign_keys=OFF;

BEGIN TRANSACTION;

COMMIT;

sqlite>

So where are the examples being stored? How does a "dataset" map to tables? There seem to be no tables in the sqlite3 database. Is a db.save() command being performed automatically?

Please explain in more detail what is happening and where the data is being saved because I don't see it in my database.

Ok I figure it out. I used the wrong sqlite3 command: I used "sqlite3 prodigy" instead of "sqlite3 prodigy.db". The first command just creates an empty database in the file called "prodigy". The second opens the actual database in file prodigy.db. The .db.add_dataset('test_dataset') command creates a table called dataset (if it doesn't exist yet) and adds a record with several fields, one being the name of the dataset (test_dataset). The command db.add_examples(...) creates the table called 'examples' (if it doesn't already exist) and then adds the example as a record in that table. At some point in this process, it also adds a record to a table called 'link' that has foreign keys to both 'dataset' and 'example' to show that these examples belong to this dataset.

I would have liked to see this level if detail in the documentation, but at least I know how it works now.

Glad you found the answer! Adding the table info more prominently to the docs is a good idea. At the moment the tables that are created are only really mentioned in the section on permissions. If you haven't seen it yet, you can find the API docs of the Database class in your PRODIGY_README.html.

Here are the tables added by Prodigy and what's in them (will also copy that info over to the docs later :slightly_smiling_face:):

Table Description
Dataset The dataset IDs and dataset meta.
Example The individual annotation examples. Each example is only added once, so if you add the same annotation to multiple datasets, it'll only have on record here.
Link Example IDs linked to datasets. This is how Prodigy knows which examples belong to which datasets.