dataset error

I am getting an error that no table exists called “dataset”. I originally run prodigy on my server with PRODIGY_HOME not specified. So it created a .prodigy folder on the Linux server. However, I want the .prodigy folder to live on a shared drive, so I specified PRODIGY_HOME to that file share and tried to run the prodigy dataset command but got this error.

(ef) [jeweinbe@pmc-pia-ap9d .prodigy]$ prodigy dataset my_set “a test set” --author Jason
Traceback (most recent call last):
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3830, in execute_sql
cursor.execute(sql, params or ())
sqlite3.OperationalError: no such table: dataset

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/opt/anaconda/envs/ef/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “/opt/anaconda/envs/ef/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/prodigy/main.py”, line 230, in
plac.call(commands[command], arglist=args, eager=False)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/prodigy/main.py”, line 39, in dataset
if set_id in DB:
File “cython_src/prodigy/components/db.pyx”, line 108, in prodigy.components.db.Database.contains
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 4988, in get
return sq.get()
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3220, in get
return next(clone.execute())
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3274, in execute
self._qr = ResultWrapper(model_class, self._execute(), query_meta)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 2939, in _execute
return self.database.execute_sql(sql, params, self.require_commit)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3837, in execute_sql
self.commit()
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3656, in exit
reraise(new_type, new_type(*exc_args), traceback)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 135, in reraise
raise value.with_traceback(tb)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3830, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: no such table: dataset

Thanks for the report! It looks like Prodigy somehow fails to create the tables on initialisation, wihch leads to this error down the line. The way the ORM is currently structured isn’t 100% perfect yet – so there might be a bug somewhere (if so, sorry about that!).

Some suggestions to help debug this:

  • A prodigy.db should have been PRODIGY_HOME directory – could you try removing/renaming it and running the command again, so Prodigy will recreate it? Maybe your database was corrupted in the process
  • What happens if you set the database name to ":memory:" in your database settings in the prodigy.json? This will only store the database in memory for debugging purposes – but if this fails, too, this might indicate a bug in the database that goes beyond basic path and setup issues.
"db_settings": {"sqlite": {"name": ":memory:"}}

You can also try accessing the database component from Python, and check if it can connect successfully. In this example, I’m using both the explicit name and path to make sure it definitely uses the right file:

from prodigy.components.db import connect
database = connect('sqlite', {'name': 'prodigy.db', 'path': '/path/to/home'})
database.db.get_tables()  # expected: ['dataset', 'example', 'link']

@jeweinb Did you ever find a solution? I seem to have the same problem when deploying prodigy in an Azure Container Instance and using Azure Files as mount for the PRODIGY_HOME directory to save the annotations. If I set the database name to :memory: the application is working.

Hi @Aleiny,

If you're deploying in a container, I think you probably don't want to use the SQLite database? Usually you'll want to assume that the home directory is completely transient for a container instance, so storing all the annotations there likely isn't the answer.

I suggest you probably want to launch whatever the most convenient managed SQL DB is on Azure, and then pass the connection information into your container using environment variables (at least, probably you want to use an environment variable for the db password).

Hi @honnibal,

Yea we ended up connecting to a Postgresql DB on Azure and it's working perfectly, thanks.
A bit off topic but we noticed that the mobile UI for ner.manual and ner.make-gold isn't really working properly if someone wants to annotate a word that is more than 1 span. Is it on the roadmap to make this more user friendly or should we stick to a desktop when working with these recipes.

Manual span selection on touch devices is a little tricky, due to how different devices handle text selection and highlighting. On touch screen devices, you should be able to swipe across the tokens (start token to end token) to highlight a span, as of Prodigy v1.7. Even if you swipe diagonally across the screen, what matters is the tokens you start and end on.

You could also try setting "ner_manual_require_click": true, which will add a + button to the top bar. You can then select text (however that's done on the device, e.g. press and hold and drag those little blue bubbles) and hit that button to "lock in" the selection. That's a bit more "native", but probably also easily annoying and a bit tedious.