Error in loading a new dataset in PostgreSQL

Hi congrats on Prodigy!

I am currently having trouble loading a datset.

prodigy dataset new_dataset "A new dataset"

Here is my prodigy.json.

{
    "theme": "basic",
    "custom_theme": {},
    "batch_size": 10,
    "port": 8080,
    "host": "localhost",
    "cors": true,
    "db": "sqlite",
    "db_settings": {
        "sqlite": {
            "name": "prodigy.db",
            "path": "/tmp/here"
        },
        "postgresql": {
            "name": "prodigy",
            "user": "motoki"
        }
    },
    "auto_create": true,
    "show_stats": false,
    "hide_meta": false,
    "diff_style": "words",
    "html_template": false
}

If I use sqlite, I receive a the following error.

Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 142, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/prodigy/__init__.py", line 4, in <module>
    from . import recipes, about  # noqa
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/prodigy/recipes/__init__.py", line 4, in <module>
    from . import ner, textcat, compare, terms, generic # noqa
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 33, in <module>
    DB = connect()
  File "cython_src/prodigy/components/db.pyx", line 25, in prodigy.components.db.connect
  File "cython_src/prodigy/components/db.pyx", line 83, in prodigy.components.db.Database.__init__
  File "cython_src/prodigy/components/db.pyx", line 60, in prodigy.components.db.connect_sqlite
TypeError: unsupported operand type(s) for /: 'str' and 'str'

When using postgres, loading a dataset works the first time, but I get the following error if I try to load another:

Traceback (most recent call last):
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/peewee.py", line 3768, in execute_sql
    cursor.execute(sql, params or ())
psycopg2.ProgrammingError: relation "dataset" already exists


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 142, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/prodigy/__init__.py", line 4, in <module>
    from . import recipes, about  # noqa
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/prodigy/recipes/__init__.py", line 4, in <module>
    from . import ner, textcat, compare, terms, generic # noqa
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 33, in <module>
    DB = connect()
  File "cython_src/prodigy/components/db.pyx", line 25, in prodigy.components.db.connect
  File "cython_src/prodigy/components/db.pyx", line 86, in prodigy.components.db.Database.__init__
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/peewee.py", line 3855, in create_tables
    create_model_tables(models, fail_silently=safe)
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/peewee.py", line 5294, in create_model_tables
    m.create_table(**create_table_kwargs)
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/peewee.py", line 4975, in create_table
    db.create_table(cls)
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/peewee.py", line 3852, in create_table
    return self.execute_sql(*qc.create_table(model_class, safe))
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/peewee.py", line 3775, in execute_sql
    self.commit()
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/peewee.py", line 3598, in __exit__
    reraise(new_type, new_type(*exc_args), traceback)
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/peewee.py", line 135, in reraise
    raise value.with_traceback(tb)
  File "/Users/apewu/writelab/prodigy/lib/python3.6/site-packages/peewee.py", line 3768, in execute_sql
    cursor.execute(sql, params or ())
peewee.ProgrammingError: relation "dataset" already exists

Thanks!

Looks like we have a bug in the way the path is passed into the database: the DB is expecting a pathlib.Path object, but is getting a string.

Until we push the next version, could you use an SQLite table in the default location, by removing the db_settings key from your prodigy.json?

Thanks! works now.

Fixed the path error (simple typo, sorry about that) and it will be included in the next release.

Btw, another suggestion for the workaround: if you want the database in a custom location, you could also use the default location in the Prodigy settings, but change the PRODIGY_HOME via an environment variable. But this means that you’d also have to move the prodigy.json there (assuming you’re loading it from your Prodigy home and not your current working directory).

About the PostgreSQL error: Will look into that! Prodigy is using peewee as its ORM, so this should hopefully be easy to debug. Did a quick search for that error and my guess is that it’s currently trying to re-create the database relation every time it connects, without checking if it already exists. This doesn’t seem to bother SQLite, but seems to make a difference for PostgreSQL.

1 Like

Just fixed the PostgreSQL error and tested it locally – should all work fine now and the fix will be included in the next release :tada:

Turns out the database needed a safe=True to ensure that tables weren’t re-added. While debugging, I also discovered a PostgreSQL-specific inconsistency with how BlobFields are interpreted (“bytea” instead of “blob”, see here for details). Those fields are used for project meta and examples, i.e. dumped JSON strings. So Prodigy now handles those correctly as well.

1 Like