Thanks Ines for your help and follow up. I’ve tried you trick of moving databases calls before loading the spacy model in the ner.make-gold
recipie. I had good hopes, but it didn’t solve the problem. I think it is because you not only have calls to the DB in that recipie, but also in subsequent ones. DB calls from that recipie are OK, but subsequent ones fails. So I think we are definitely in front of a bug with the database connection management that hangs at a certain point.
Here is the logs and stack trace of the error (with JSON results cut for confidentiality):
/www/app # python3 -m prodigy ner.make-gold en_ner_1 en_core_web_lg
22:01:44 - DB: Initialising database PostgreSQL
22:01:46 - DB: Connecting to database PostgreSQL
{'host': 'prodigy.cqu0bffnxyej.eu-west-1.rds.amazonaws.com', 'dbname': 'prodigy', 'user': 'prodigy', 'password': 'pK7XB8KsjmmL3.HyLDDt'}
22:01:46 - RECIPE: Calling recipe 'ner.make-gold'
22:01:47 - DB: Loading dataset 'en_ner_1' (17 examples)
22:01:48 - RECIPE: Starting recipe ner.make-gold
{'task_hashes': {-945020926, 1397511014, ...}, 'examples': [{'text': "Sanofi devient"
, 'label': [], 'spacy_model': 'en_core_web_lg', 'dataset': 'en_ner_1'}
22:01:55 - MODEL: Added sentence boundary detector to model pipeline
['sbd', 'tagger', 'parser', 'ner']
22:01:55 - RECIPE: Initialised EntityRecognizer with model en_core_web_lg
{'lang': 'en', 'pipeline': ['sbd', 'tagger', 'parser', 'ner'], 'accuracy': {'token_acc': 99.8890484271, 'ents_p': 85.540697997, 'ents_r': 86.1621863298, 'uas': 91.8900594047, 'tags_acc': 97.2044842264, 'ents_f': 85.8503174073, 'las': 90.0726533777}, 'name': 'core_web_lg', 'license': 'CC BY-SA 3.0', 'author': 'Explosion AI', 'url': 'https://explosion.ai', 'vectors': {'width': 300, 'vectors': 684831, 'keys': 684830}, 'sources': ['OntoNotes 5', 'Common Crawl'], 'version': '2.0.0', 'spacy_version': '>=2.0.0a18', 'parent_package': 'spacy', 'speed': {'gpu': None, 'nwords': 291344, 'cpu': 5023.1042787614}, 'email': 'contact@explosion.ai', 'description': 'English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.'}
22:01:55 - CONTROLLER: Initialising from recipe
{'config': {'lang': 'en', 'label': 'all', 'dataset': 'en_ner_1', 'host': '0.0.0.0', 'db': 'postgresql', 'db_settings': {'sqlite': {'name': 'prodigy.db', 'path': '/var/www/'}, 'postgresql': {'host': 'prodigy.cqu0bffnxyej.eu-west-1.rds.amazonaws.com', 'dbname': 'prodigy', 'user': 'prodigy', 'password': 'pK7XB8KsjmmL3.HyLDDt'}}}, 'dataset': 'en_ner_1', 'db': True, 'exclude': None, 'get_session_id': None, 'on_exit': None, 'on_load': None, 'progress': <prodigy.components.progress.ProgressEstimator object at 0x7fcc68085320>, 'self': <prodigy.core.Controller object at 0x7fcc68085278>, 'stream': <generator object at 0x7fcc66b00510>, 'update': None, 'view_id': 'ner'}
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3830, in execute_sql
cursor.execute(sql, params or ())
psycopg2.OperationalError: SSL SYSCALL error: EOF detected
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/site-packages/prodigy/__main__.py", line 248, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 161, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "cython_src/prodigy/core.pyx", line 36, in prodigy.core.Controller.__init__
File "cython_src/prodigy/core.pyx", line 125, in prodigy.core.Controller.connect_db
File "cython_src/prodigy/components/db.pyx", line 107, in prodigy.components.db.Database.__contains__
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 4988, in get
return sq.get()
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3220, in get
return next(clone.execute())
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3274, in execute
self._qr = ResultWrapper(model_class, self._execute(), query_meta)
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 2939, in _execute
return self.database.execute_sql(sql, params, self.require_commit)
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3837, in execute_sql
self.commit()
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3656, in __exit__
reraise(new_type, new_type(*exc_args), traceback)
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 135, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3830, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: SSL SYSCALL error: EOF detected
/www/app #
Out of curiosity, when do you plan to release the next version of Prodigy ?