Prodigy Startup Fails with no such table:dataset

Hi Team,

I work with Google Cloud and I created an image with prodigy (V.1.11.5), what I try to with Cloud Run.

I would like to work with input files on GCS (gs://xyz/project1) via gcsfuse. I would like to use this place to the prodigy.db file, too.

I defined the following environment variables:
PRODIGY_HOST: 0.0.0.0
PRODIGY_PORT: 8080
PRODIGY_HOME: /gcs/xyz/project1
PRODIGY_LOGGING: verbose

During the startup process prodigy tries to create/read the prodigy.db file bit it fails with the following error message:
"Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/peewee.py", line 3160, in execute_sql
cursor.execute(sql, params or ())
sqlite3.OperationalError: no such table: dataset"

on GCS I can see two new files:
prodigy.db 0 B
prodigy.db-journal 512 B

Do you have an idea, why the database file cannot be created properly?

Thanks in advance for your support!

Best Regards,
Sándor

Hi! This is strange – it looks like for some reason, the prodigy.db file might have not been created correctly. One quick thing to try would be to just remove the file and restart to let Prodigy recreate the DB. Maybe it was just temporary glitch. You could also double-check that your user has permissions to create files in the PRODIGY_HOME directory.

Finally, as a quick workaround, you could also create a simple SQLite file manualy with the tables Dataset, Example and Link.

Hi Ines, thanks for your reply!

Somehow the database transaction cannot be finished (that is the reason for the db-journal file) on this filesystem mounted via gcsfuse.
When I am using the default filesystem for the sqlite database (prodigy.db), it is working properly.

I will try to find out what the problem could be around gcsfuse.

Hi Ines,

this could be the issue:
Not all of the usual file system features are supported.

I will keep an eye on the gcsfuse releases and make some tests, when required.

source:
sqlite3.OperationalError: disk I/O error · Issue #480 · GoogleCloudPlatform/gcsfuse (github.com)
gcsfuse/semantics.md at master · GoogleCloudPlatform/gcsfuse (github.com)