OperationalError: SSL SYSCALL error: EOF detected

bug
ner
database

(Kfrancoi) #1

We connected Prodigy to a PostgreSQL database on AWS RDS. Using the right settings for postgresql in the prodigy.json file, we were quickly up and running. Creating a database and loading documents in that database didn’t present any problem.

But when we want to start the ner.make-gold recipie, Prodigy outputs the following error :

prodigy python -m prodigy ner.make-gold test_db en_core_web_lg
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3830, in execute_sql
    cursor.execute(sql, params or ())
psycopg2.OperationalError: SSL SYSCALL error: EOF detected

It appears that this OperationalError EOF detected occurs when the database timeouts. Indeed, we only get this error when our model (in this case en_core_web_lg) is a big one and takes a long time to get instantiated (about 30 secs). For smaller models (i.e fr_core_news_md or nl_core_news_sm), ner.make-gold runs just fine.

Could there be a problem in the way you setup the connection with PostgreSQL using psycopg2 ?

Thanks a lot for your help.


(Ines Montani) #2

Thanks for the report! Prodigy’s database handling is powered by the peewee module, which should hopefully make this easier to debug.

The model size having an impact is pretty interesting… one possible explanation could be that the database connection times out while the model is loaded, so the subsequent calls fail (which is weird and possibly fixable). To test this, you could try editing prodigy/recipes/ner.py and moving the calls to the DB further up in the recipe so that they’re made before the model is loaded:

examples = DB.get_dataset(dataset)
task_hashes = DB.get_task_hashes(dataset)
# load the model and do everything else afterwards

Btw, speaking of Prodigy’s PostgreSQL integration: It might not be relevant to this problem, but maybe you’ll find it useful later on. This thread shows an example of connecting to a remote PostgreSQL database by creating the peewee DB manually, and passing it into Prodigy’s Database. It also uses the Playhouse extension for peewee, which comes with additional extensions for PostgreSQL.

The upcoming version of Prodigy will include some improvements to the database connection handling, which might also help with this problem. And, finally, we’ve never really been happy with the way ner.make-gold works (e.g. it requiring a raw dataset and making many passes over the data). So in the upcoming version, the current ner.make-gold recipe will be replaced with a more convenient version using the ner_manual interface to create gold-standard data faster by correcting the model’s predictions.

Edit: Forgot to add another debugging tip. In case you haven’t seen it already, you can also run all Prodigy recipes and command with the PRODIGY_LOGGING=basic or PRODIGY_LOGGING=verbose environment variable. This will log everything that’s going on, including database stuff.


(Kfrancoi) #3

Thanks Ines for your help and follow up. I’ve tried you trick of moving databases calls before loading the spacy model in the ner.make-gold recipie. I had good hopes, but it didn’t solve the problem. I think it is because you not only have calls to the DB in that recipie, but also in subsequent ones. DB calls from that recipie are OK, but subsequent ones fails. So I think we are definitely in front of a bug with the database connection management that hangs at a certain point.

Here is the logs and stack trace of the error (with JSON results cut for confidentiality):

/www/app # python3 -m prodigy ner.make-gold en_ner_1 en_core_web_lg
22:01:44 - DB: Initialising database PostgreSQL
22:01:46 - DB: Connecting to database PostgreSQL
{'host': 'prodigy.cqu0bffnxyej.eu-west-1.rds.amazonaws.com', 'dbname': 'prodigy', 'user': 'prodigy', 'password': 'pK7XB8KsjmmL3.HyLDDt'}

22:01:46 - RECIPE: Calling recipe 'ner.make-gold'
22:01:47 - DB: Loading dataset 'en_ner_1' (17 examples)
22:01:48 - RECIPE: Starting recipe ner.make-gold
{'task_hashes': {-945020926, 1397511014, ...}, 'examples': [{'text': "Sanofi devient"
,  'label': [], 'spacy_model': 'en_core_web_lg', 'dataset': 'en_ner_1'}

22:01:55 - MODEL: Added sentence boundary detector to model pipeline
['sbd', 'tagger', 'parser', 'ner']

22:01:55 - RECIPE: Initialised EntityRecognizer with model en_core_web_lg
{'lang': 'en', 'pipeline': ['sbd', 'tagger', 'parser', 'ner'], 'accuracy': {'token_acc': 99.8890484271, 'ents_p': 85.540697997, 'ents_r': 86.1621863298, 'uas': 91.8900594047, 'tags_acc': 97.2044842264, 'ents_f': 85.8503174073, 'las': 90.0726533777}, 'name': 'core_web_lg', 'license': 'CC BY-SA 3.0', 'author': 'Explosion AI', 'url': 'https://explosion.ai', 'vectors': {'width': 300, 'vectors': 684831, 'keys': 684830}, 'sources': ['OntoNotes 5', 'Common Crawl'], 'version': '2.0.0', 'spacy_version': '>=2.0.0a18', 'parent_package': 'spacy', 'speed': {'gpu': None, 'nwords': 291344, 'cpu': 5023.1042787614}, 'email': 'contact@explosion.ai', 'description': 'English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.'}

22:01:55 - CONTROLLER: Initialising from recipe
{'config': {'lang': 'en', 'label': 'all', 'dataset': 'en_ner_1', 'host': '0.0.0.0', 'db': 'postgresql', 'db_settings': {'sqlite': {'name': 'prodigy.db', 'path': '/var/www/'}, 'postgresql': {'host': 'prodigy.cqu0bffnxyej.eu-west-1.rds.amazonaws.com', 'dbname': 'prodigy', 'user': 'prodigy', 'password': 'pK7XB8KsjmmL3.HyLDDt'}}}, 'dataset': 'en_ner_1', 'db': True, 'exclude': None, 'get_session_id': None, 'on_exit': None, 'on_load': None, 'progress': <prodigy.components.progress.ProgressEstimator object at 0x7fcc68085320>, 'self': <prodigy.core.Controller object at 0x7fcc68085278>, 'stream': <generator object at 0x7fcc66b00510>, 'update': None, 'view_id': 'ner'}

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3830, in execute_sql
    cursor.execute(sql, params or ())
psycopg2.OperationalError: SSL SYSCALL error: EOF detected


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/site-packages/prodigy/__main__.py", line 248, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 161, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src/prodigy/core.pyx", line 36, in prodigy.core.Controller.__init__
  File "cython_src/prodigy/core.pyx", line 125, in prodigy.core.Controller.connect_db
  File "cython_src/prodigy/components/db.pyx", line 107, in prodigy.components.db.Database.__contains__
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 4988, in get
    return sq.get()
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3220, in get
    return next(clone.execute())
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3274, in execute
    self._qr = ResultWrapper(model_class, self._execute(), query_meta)
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 2939, in _execute
    return self.database.execute_sql(sql, params, self.require_commit)
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3837, in execute_sql
    self.commit()
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3656, in __exit__
    reraise(new_type, new_type(*exc_args), traceback)
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 135, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.6/site-packages/peewee.py", line 3830, in execute_sql
    cursor.execute(sql, params or ())
peewee.OperationalError: SSL SYSCALL error: EOF detected

/www/app #  

Out of curiosity, when do you plan to release the next version of Prodigy ?


(Matthew Honnibal) #4

I wonder whether you might have run out of memory? That would explain the different result between the small and large models, and apparently out-of-memory is a potential cause of that error.


(Max Countryman) #5

Hi,

We’re running in this same issue when trying to load JSONL files into a dataset via the prodigy db-in interface. It seems to only happen with larger JSONL files, e.g. in this case the file is just over 2000 lines.

Any help is appreciated,

Max


(Matthew Honnibal) #6

@max Have you checked your memory usage?


(Max Countryman) #7

Memory usage of what by what?

At a high-level, I don’t notice anything abnormal, but if you could be more specific I can certainly dig into it.


(Matthew Honnibal) #8

I mostly meant at a high level, yeah — just watching the process in top. I think there’s also a way to report the maximum usage of a process so you can catch the problem after the fact.