I'm having issues with a remote postgresql database. The issues appear to be connection related. While I can connect and export data via python, when I run prodigy it just hangs.
I'm really not sure what's going on or how to debug this so I'm gonna dump all that I know.
Database:
- PostgreSQL 13.10
- AWS RDS db.t3.small, 30GB storage
- Currently about 2.5GB of "data" in the database
- When I export all the datasets to json files I get about 100MB of data
DB settings:
{
"db": "postgresql",
"db_settings": {
"postgresql": {
"dbname": "{{db_name}}",
"user": "{{db_user}}",
"password": "{{db_password}}"
}
}
}
# ENV
PGHOST=dbname.etc.rds.amazonaws.com
Accessing DB
I can connect and export data fine using python, e.g.
from prodigy.components.db import connect
import json
db = connect()
for dataset in db.datasets:
data = db.get_dataset_examples(dataset)
if data:
with open(f"{dataset}.json", mode='w') as writer:
json.dump(data, writer, ensure_ascii=False, indent=4)
Now if I try to run a prodigy recipe, e.g. prodigy ner.manual new_data_set blank:en some_file.txt --label X,Y,Z,etc.
it hangs,
Using 16 label(s): NOUN, VERB, PRONOUN, EVENT, IWI-HAPU, LANGUAGE, LAW, LOC,
ORG, PERSON, TANGATA, TIME, WAKA, MAHITOI, AUA, PRODUCT
^X^C^X^CTraceback (most recent call last):
File "/webapp/lib/python3.10/site-packages/prodigy/__main__.py", line 62, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 384, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "cython_src/prodigy/core.pyx", line 73, in prodigy.core.Controller.from_components
File "cython_src/prodigy/core.pyx", line 164, in prodigy.core.Controller.__init__
File "cython_src/prodigy/core.pyx", line 247, in prodigy.core.Controller.get_dataset_named_sessions
File "/webapp/lib/python3.10/site-packages/prodigy/components/db.py", line 257, in sessions
return [ds.name for ds in datasets]
File "/webapp/lib/python3.10/site-packages/prodigy/components/db.py", line 257, in <listcomp>
return [ds.name for ds in datasets]
File "/webapp/lib/python3.10/site-packages/peewee.py", line 4543, in next
self.cursor_wrapper.iterate()
File "/webapp/lib/python3.10/site-packages/peewee.py", line 4463, in iterate
result = self.process_row(row)
File "/webapp/lib/python3.10/site-packages/peewee.py", line 7706, in process_row
data = super(ModelObjectCursorWrapper, self).process_row(row)
File "/webapp/lib/python3.10/site-packages/peewee.py", line 7672, in process_row
result[attr] = converters[i](row[i])
File "/webapp/lib/python3.10/site-packages/peewee.py", line 4707, in python_value
return value if value is None else self.adapt(value)
File "/webapp/lib/python3.10/site-packages/peewee.py", line 4856, in adapt
if isinstance(value, text_type):
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/webapp/lib/python3.10/site-packages/prodigy/__main__.py", line 62, in <module>
controller = recipe(*args, use_plac=True)
KeyboardInterrupt
I had to interrupt the command. My instance running prodigy becomes slow and eventually unresponsive. So I have to cancel the command, KeyboardInterrupt
. Prodigy never actually runs. Here's what RDS reports while the prodigy command is running,
Something happens! It connects, etc, but it can't get to the running stage. If I run prodigy stats
I get a similar behavior.
We've never had this problem before. What's changed since "before?" More data in the database, updated to the latest prodigy version. I've since updated the postgresql version and bumped up the instance size, but that didn't resolve my issues. If I run prodigy using a local database everything runs fine.
Update
I can connect to the database and run prodigy from another machine, which leads me to suspect something's up with the ec2 instance I'm using to run prodigy. I'm using Ubuntu 20.04LTS Machine AMI (a variant published by AWS for machine learning). Having said that, it still takes a long time for prodigy to "start up."
Any help, even with further debugging, would be greatly appreciated.