Hello! I have a "real" Prodigy database in PostgreSQL. But when I experiment with new recipes, I use a local "toy" SQLite one, so that I don't pollute important data. I almost wiped out a really big dataset from PostgreSQL database because of a very counter-unintuitive behavior.
I was going to copy the dataset from the "real" database to the "toy" one, but first delete whatever was in the "toy" database under the same dataset name.
Here are the steps I performed:
db_id_sqlite = "sqlite"
db_settings_sqlite = {
"name": "prodigy.db",
<...>
}
db_local = connect(db_id_sqlite, db_settings_sqlite)
db_id = "postgresql"
db_settings = {
"dbname": "prodigy",
<...>
}
db = connect(db_id, db_settings)
dataset_name = 'dataset'
examples = db.get_dataset_examples(dataset_name)
local_examples = db_local.get_dataset_examples(dataset_name)
I was going to delete the dataset in db_local and copy newly annotated data, but I noticed that examples
and local_examples
where of the same length, and eventually I figured out that db_local
was pointing to the "real" database! Even though db_local.db_id
was saying "sqlite".
I understand it's probably because of peewee
recommended setup where DB handle/proxy is a global variable. But this is totally unexpected. I don't know if you can fix that, but at least please put a warning in the docs.