Unexpected database behavior with multiple connections

Hello! I have a "real" Prodigy database in PostgreSQL. But when I experiment with new recipes, I use a local "toy" SQLite one, so that I don't pollute important data. I almost wiped out a really big dataset from PostgreSQL database because of a very counter-unintuitive behavior.
I was going to copy the dataset from the "real" database to the "toy" one, but first delete whatever was in the "toy" database under the same dataset name.
Here are the steps I performed:

db_id_sqlite = "sqlite"
db_settings_sqlite =  {
      "name": "prodigy.db",
       <...>
 }
db_local = connect(db_id_sqlite, db_settings_sqlite) 

db_id = "postgresql"
db_settings =  {
      "dbname": "prodigy",
      <...>
  }
db = connect(db_id, db_settings) 
dataset_name = 'dataset'
examples = db.get_dataset_examples(dataset_name)
local_examples = db_local.get_dataset_examples(dataset_name)

I was going to delete the dataset in db_local and copy newly annotated data, but I noticed that examples and local_examples where of the same length, and eventually I figured out that db_local was pointing to the "real" database! Even though db_local.db_id was saying "sqlite".

I understand it's probably because of peewee recommended setup where DB handle/proxy is a global variable. But this is totally unexpected. I don't know if you can fix that, but at least please put a warning in the docs.

Hi @nrodnova,

Thanks for pointing this out. You're right the DB proxy is a global variable which means that it's being updated with every call to the connect method.
This behavior is designed to prevent issues with multiple open connections and to ensure that all parts of the application are using the same database connection, but I see how it might end up being very misleading.

In your case it would be best to call .close() on the first DB object after doing all the required operations and before connecting to another DB to make sure you're interacting with the correct DB.

I've added a Hi @nrodnova,

Thanks for pointing this out. You're right the DB PROXY is a global variable which means that it's being updated with every call to connect method.
This behavior is designed to prevent issues with multiple open connections and to ensure that all parts of the application are using the same database connection, but I see how it might end up being very misleading.

In your case it would be best to call .close() on the first DB object after doing all the required operations and before connecting to another DB to make sure you're interacting with the correct DB.

I've added a warning to the connect API documentation per your suggestion, hopefully it makes this implementation detail more transparent. Thank you, again!

1 Like