dataset error

jeweinb · November 21, 2017, 4:35pm

I am getting an error that no table exists called “dataset”. I originally run prodigy on my server with PRODIGY_HOME not specified. So it created a .prodigy folder on the Linux server. However, I want the .prodigy folder to live on a shared drive, so I specified PRODIGY_HOME to that file share and tried to run the prodigy dataset command but got this error.

(ef) [jeweinbe@pmc-pia-ap9d .prodigy]$ prodigy dataset my_set “a test set” --author Jason
Traceback (most recent call last):
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3830, in execute_sql
cursor.execute(sql, params or ())
sqlite3.OperationalError: no such table: dataset

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/opt/anaconda/envs/ef/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
“main”, mod_spec)
File “/opt/anaconda/envs/ef/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/prodigy/main.py”, line 230, in
plac.call(commands[command], arglist=args, eager=False)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/prodigy/main.py”, line 39, in dataset
if set_id in DB:
File “cython_src/prodigy/components/db.pyx”, line 108, in prodigy.components.db.Database.contains
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 4988, in get
return sq.get()
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3220, in get
return next(clone.execute())
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3274, in execute
self._qr = ResultWrapper(model_class, self._execute(), query_meta)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 2939, in _execute
return self.database.execute_sql(sql, params, self.require_commit)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3837, in execute_sql
self.commit()
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3656, in exit
reraise(new_type, new_type(*exc_args), traceback)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 135, in reraise
raise value.with_traceback(tb)
File “/opt/anaconda/envs/ef/lib/python3.6/site-packages/peewee.py”, line 3830, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: no such table: dataset

ines · November 21, 2017, 5:13pm

Thanks for the report! It looks like Prodigy somehow fails to create the tables on initialisation, wihch leads to this error down the line. The way the ORM is currently structured isn’t 100% perfect yet – so there might be a bug somewhere (if so, sorry about that!).

Some suggestions to help debug this:

A prodigy.db should have been PRODIGY_HOME directory – could you try removing/renaming it and running the command again, so Prodigy will recreate it? Maybe your database was corrupted in the process
What happens if you set the database name to ":memory:" in your database settings in the prodigy.json? This will only store the database in memory for debugging purposes – but if this fails, too, this might indicate a bug in the database that goes beyond basic path and setup issues.

"db_settings": {"sqlite": {"name": ":memory:"}}

You can also try accessing the database component from Python, and check if it can connect successfully. In this example, I’m using both the explicit name and path to make sure it definitely uses the right file:

from prodigy.components.db import connect
database = connect('sqlite', {'name': 'prodigy.db', 'path': '/path/to/home'})
database.db.get_tables()  # expected: ['dataset', 'example', 'link']

Aleiny · December 16, 2019, 3:42pm

@jeweinb Did you ever find a solution? I seem to have the same problem when deploying prodigy in an Azure Container Instance and using Azure Files as mount for the PRODIGY_HOME directory to save the annotations. If I set the database name to :memory: the application is working.

honnibal · December 17, 2019, 8:22pm

Hi @Aleiny,

If you're deploying in a container, I think you probably don't want to use the SQLite database? Usually you'll want to assume that the home directory is completely transient for a container instance, so storing all the annotations there likely isn't the answer.

I suggest you probably want to launch whatever the most convenient managed SQL DB is on Azure, and then pass the connection information into your container using environment variables (at least, probably you want to use an environment variable for the db password).

Aleiny · December 18, 2019, 1:43pm

Hi @honnibal,

Yea we ended up connecting to a Postgresql DB on Azure and it's working perfectly, thanks.
A bit off topic but we noticed that the mobile UI for ner.manual and ner.make-gold isn't really working properly if someone wants to annotate a word that is more than 1 span. Is it on the roadmap to make this more user friendly or should we stick to a desktop when working with these recipes.

ines · December 18, 2019, 2:40pm

Manual span selection on touch devices is a little tricky, due to how different devices handle text selection and highlighting. On touch screen devices, you should be able to swipe across the tokens (start token to end token) to highlight a span, as of Prodigy v1.7. Even if you swipe diagonally across the screen, what matters is the tokens you start and end on.

You could also try setting "ner_manual_require_click": true, which will add a + button to the top bar. You can then select text (however that's done on the device, e.g. press and hold and drag those little blue bubbles) and hit that button to "lock in" the selection. That's a bit more "native", but probably also easily annoying and a bit tedious.

Topic		Replies	Views
How do tables map to datasets in prodigy DB? database , solved	2	733	December 13, 2019
Prodigy Startup Fails with no such table:dataset usage , database , google-cloud	3	378	November 16, 2021
Error in loading a new dataset in PostgreSQL done , database	4	1532	September 15, 2017
peewee.OperationalError: unable to open database file usage , database , solved	3	5513	July 17, 2019
mysql error for drop_dataset and get_dataset database , solved	2	1448	September 25, 2018

dataset error

Related topics