Large number of database connections.

Hey!

Recently we started using Prodigy for sentences annotation. We use a default ner.manual recipe for that purpose and have four people working on the same dataset at the same time (web).

A few days ago our other services ran out of database connections, so we started investigating the problem. It turned out that Prodigy grabs hundreds of database connections (a picture is provided below), though we don't see any specific reason for that.

We discovered this issue using 1.10.4 for Linux, but updating to 1.10.5 for Linux gave no results.

We tried limiting connections to 10, but now the service does not even launch. Here's a traceback based on our logs:

Task exception was never retrieved 
future: <Task finished name='Task-805' coro=<RequestResponseCycle.run_asgi() done, defined at /usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py:388> exception=OperationalError('FATAL:  too many connections for role "prodigy"\n')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 393, in run_asgi
    self.logger.error(msg, exc_info=exc)
  File "/usr/local/lib/python3.8/logging/__init__.py", line 1463, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/local/lib/python3.8/logging/__init__.py", line 1577, in _log
    self.handle(record)
  File "/usr/local/lib/python3.8/logging/__init__.py", line 1586, in handle
    if (not self.disabled) and self.filter(record):
  File "/usr/local/lib/python3.8/logging/__init__.py", line 807, in filter
    result = f.filter(record)
  File "cython_src/prodigy/util.pyx", line 121, in prodigy.util.ServerErrorFilter.filter
  File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 140, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 134, in __call__
    await self.error_middleware(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 178, in __call__
    raise exc from None
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 156, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/usr/local/lib/python3.8/site-packages/prodigy/app.py", line 198, in reset_db_middleware
    response = await call_next(request)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 73, in __call__
    raise exc from None
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 62, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 590, in __call__
    await route(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 208, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 129, in app
    raw_response = await run_in_threadpool(dependant.call, **values)
  File "/usr/local/lib/python3.8/site-packages/starlette/concurrency.py", line 25, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/site-packages/prodigy/app.py", line 378, in get_session_project
    config["total"] = controller.get_total_by_session(session_id)
  File "cython_src/prodigy/core.pyx", line 187, in prodigy.core.Controller.get_total_by_session
  File "/usr/local/lib/python3.8/site-packages/prodigy/components/db.py", line 207, in __contains__
    has_ds = bool(Dataset.get(Dataset.name == name))
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 6430, in get
    return sq.get()
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 6860, in get
    return clone.execute(database)[0]
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 1898, in inner
    return method(self, database, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 1969, in execute
    return self._execute(database)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 2141, in _execute
    cursor = database.execute(self)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3142, in execute
    return self.execute_sql(sql, params, commit=commit)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3136, in execute_sql
    self.commit()
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 2902, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 185, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3127, in execute_sql
    cursor = self.cursor(commit)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3111, in cursor
    self.connect()
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3068, in connect
    self._initialize_connection(self._state.conn)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 2902, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 185, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3065, in connect
    self._state.set_connection(self._connect())
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3761, in _connect
    conn = psycopg2.connect(database=self.database, **self.connect_params)
  File "/usr/local/lib/python3.8/site-packages/psycopg2/__init__.py", line 127, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
peewee.OperationalError: FATAL:  too many connections for role "prodigy"

Is there any chance this could get fixed?

Thanks in advance.

Thanks for the report @defined-user-poz!

That's indeed very strange. Especially if it's not even launching it. :thinking:

Please tell me, what database are you using? Is it Postgres or MySQL?

And also, how many cores does your server have?

I'm trying to narrow down the hypotheses of what could be the underlying problem. :nerd_face:

We use PostgreSQL 9.6 and the server instance runs two cores.

Thanks for the details @defined-user-poz

My initial hypothesis was around having too many cores (it shouldn't be a problem either, but :man_shrugging: ). But as you have two, that discards that initial theory.

Now I just set up a whole project to try and replicate the issue but I haven't been able to do it, everything seems to work as expected.

Here's what I did:

I created a docker-compose.yml file with Postgres and Adminer:

version: '3.1'

services:
  db:
    image: postgres:9.6
    restart: always
    environment:
      POSTGRES_PASSWORD: example
      POSTGRES_DB: prodigy
    ports:
      - 5432:5432

  adminer:
    image: adminer
    restart: always
    ports:
      - 8090:8080

I started it with Docker Compose:

$ docker-compose up -d

That brings up a full PostgreSQL with the same version as you, and an Adminer to manage and explore it.

Then I configured my ~/.prodigy/prodigy.json with:

{
  "db": "postgresql",
  "db_settings": {
    "postgresql": {
      "host": "localhost",
      "dbname": "prodigy",
      "user": "postgres",
      "password": "example",
    }
  }
}

Next, I started Prodigy with:

$ prodigy ner.manual ner_news_headlines blank:en ./raw_news-headlines-nyt.jsonl --label PERSON,ORG,PRODUCT,LOCATION
Using 4 label(s): PERSON, ORG, PRODUCT, LOCATION

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

Then I opened 4 tabs:

Then I quickly annotated several examples in each of those, and then saved (with Ctrl+s) almost at the same time. To make them all work and save a bunch of annotations almost at the same time.

And during the whole process, I kept another tab open with Adminer at: http://localhost:8090/?pgsql=db&username=postgres&db=prodigy&ns=public&sql=select%20* from%20pg_stat_activity where%20datname%20%3D%20'prodigy'%3B

Executing the SQL statement:

select *
from pg_stat_activity
where datname = 'prodigy';

To show each connection made to the database prodigy.

I kept executing/refreshing that query through the whole process, and it always had one single connection used by Prodigy. Plus 2 connections used by Adminer (which is only relevant for the monitoring experiment).

When I stopped Prodigy, then I only had the 2 connections used by Adminer.


Some useful next steps to debug the problem would be for you to replicate my experiment, and keep extending it with your own environment, step by step, adding a bit of extra code or technology you might be using, one by one. Until you see the number of connections raising again.

The fact that you couldn't even launch Prodigy after limiting the connections, not even the first launch, is very strange. The first thing I would suspect is any type of recursion that might be in the system. Also external tools or components you might be using.

Thanks for your support @tiangolo

Just a quick update from our side.

Recently we migrated to PostgreSQL 13.0, but unfortunately it didn't help. We also didn't manage to reproduce this issue locally, though it led to a few other problems on our side.

We ended up using a PgBouncer connection pooler, proxying all Prodigy connections to a database. The number of connections has been equal to one for a day already, and we hope it will remain low enough.

We appreciate your support a lot,
Thank you.

1 Like

I'm glad you found a solution! :rocket:

I was facing the sample issue and actually, the large number of opened connections was due to the health check made on the root /project/{session} that opens a new connection without closing it after each call. Could you look into the way prodigy app opens connections using the root /project/{session} ?

Thanks in advance.

Vincent

Hi there!

I just gave this a spin locally. I installed postgres and started spamming a local Prodigy run from the browser at /project/vincent.

When I then check postgres for the connections, it seems to be just 3:

> SELECT sum(numbackends) FROM pg_stat_database;

 sum 
-----
   3
(1 row)

I stays at three even when I make requests from different browsers and when I add sessions or restart Prodigy with new sessions. I've also looked at the code and didn't find any reason to assume that extra connections get made at that endpoint.

Could you share some extra information? How did you measure the extra connections made on /project/{session}? Is there something specific that suggested to you that it's this endpoint and not another one?