Internal Server Error

I updated Prodigy from 1.9.0 to 1.9.7 and now I'm getting an Internal Server Error when trying to load the Prodigy web app.

The logs don't give much away either:

✨  Starting the web server at http://0.0.0.0:8080 ...
Open the app in your browser and start annotating!
 INFO:     10.97.17.225:37850 - "GET /?session=simon HTTP/1.1" 500 Internal Server Error

How do I debug this?

Hi @simon.gurcke,

It's quite strange that you're not getting a traceback with the error.

I tried debugging it here, injecting an artificial error in the code, to force it to throw an Internal Server Error, but when triggering it I'm still getting the full traceback leading to the point where I injected the error. It seems the fact that you are not getting any extra info about the error, any traceback, would be the first to debug.

Which version of Python are you using?

Also, are you running it behind a proxy/load balancer? If so, do you get an error when running locally, directly in your machine?

Thanks for your response @tiangolo. I'm running Prodigy in a Docker container orchastrated by Kubernetes on AWS. So in fact there is a load balancer in between. And I'm using Python 3.7.

I can't easily test this locally as I'm making requests to Kubernetes internal services to get predictions.

Rolling Prodigy back to 1.9.0 fixes the error and I can use Prodigy. If I inject an artificial error I never get a traceback. Always just 500 Internal Server Error.

Is there anything I can do to help resolve this?

Humm, do you know what load balancer is running on top? Can you get the logs from the load balancer?

It's because I think this log:

wouldn't show up if you were running locally. So I suspect it could actually come from the load balancer and not Prodigy (even if the error was in the code related to Prodigy), and that might be hiding the actual error underneath.

Also because given it's a server error, it wouldn't show up in an INFO but in an ERROR log.

Are you running Prodigy directly or do you have a custom recipe in Python code?

Do you control the Dockerfile for that container? Can you add more code to it?

If so, you could try: https://github.com/Qix-/better-exceptions

Another option to try is to test with a small FastAPI app instead of Prodigy in that container and see if you can inject an error and get a traceback, that would help finding out where the error can be, if it's related to configs in Prodigy, the environment, etc.

Unrelated note: just checked/reconfigured my settings here to make sure I get notifications for your replies :sweat_smile:

Okay, if I run Prodigy locally it gives me the expected traceback.

I don't understand how a load balancer makes a difference. Eventually the request would hit the Prodigy code or my custom recipe code and that's where the exception occurs and should be logged. Or am I missing something?

Yes, but if the INFO log is not shown locally, then it would probably mean that it's coming from somewhere else, not from Prodigy, possibly the load balancer. It could mean that, for example, you are getting the logs of the load balancer but not the logs of the container behind it. Or maybe the container with Prodigy could be configured with an internal load balancer, e.g. an Nginx with a proxy pass to Prodigy, on the same container.

Something else you could try is adding something like Sentry as part of your recipe to log the error with the traceback to a remote server, that could work and help you figure out what's happening independent of how the logs are being handled in your cluster.

I see what you mean, however there are other log entries that clearly come from Prodigy. So if Prodigy was logging something related to the exception I assume it should appear in the same place.

Sentry is a good idea, I will try that tomorrow.

1 Like

Prodigy has its own logger, and all other log statements are produced by uvicorn. I think uvicorn also outputs log statements from other applications if they just log via logging.info. So that's whar could be happening here? But yeah, Sentry is probably the most systematic approach here.

1 Like

I've finally gotten around to doing more testing regarding this.

I'm now able to reproduce this locally when running Prodigy in a Docker container. @tiangolo you were right that the log entry stating the Internal Server Error is indeed coming from the nginx Ingress in Kubernetes, as this does not appear when running locally.

The frontend still receives an Internal Server Error and the logs are not showing anything. Only when manually exiting out of Prodigy (using Ctrl-C in terminal) the traceback appears. I also tried Sentry but it doesn't log anything at all.

I suspect this has something to do with the exception not bubbling up from an async worker until the worker is exited forcefully. Sometimes that can happen when the exception can't be pickled. Just guessing here. Weird though that this only happens in Docker (Linux) and not when running on macOS.

Not sure if this is Prodigy-specific or something that's happening upstream (FastAPI / uvicorn), but it definitely needs to be resolved.

For reference, here's the traceback I get after pressing Ctrl-C. Maybe this helps? (The error itself is no concern here, just that it doesn't appear in the logs at the time it occurs).

✨  Starting the web server at http://0.0.0.0:8080 ...
Open the app in your browser and start annotating!

^CException during reset or similar
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 693, in _finalize_fairy
    fairy._reset(pool)
File "/opt/conda/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 880, in _reset
    pool._dialect.do_rollback(self)
File "/opt/conda/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 538, in do_rollback
    dbapi_connection.rollback()
psycopg2.OperationalError: SSL SYSCALL error: EOF detected

Task exception was never retrieved
future: <Task finished coro=<RequestResponseCycle.run_asgi() done, defined at /opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py:383> exception=KeyError('label')>
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 249, in __step
    result = coro.send(None)
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 388, in run_asgi
    self.logger.error(msg, exc_info=exc)
File "/opt/conda/lib/python3.7/logging/__init__.py", line 1407, in error
    self._log(ERROR, msg, args, **kwargs)
File "/opt/conda/lib/python3.7/logging/__init__.py", line 1514, in _log
    self.handle(record)
File "/opt/conda/lib/python3.7/logging/__init__.py", line 1523, in handle
    if (not self.disabled) and self.filter(record):
File "/opt/conda/lib/python3.7/logging/__init__.py", line 751, in filter
    result = f.filter(record)
File "cython_src/prodigy/util.pyx", line 120, in prodigy.util.ServerErrorFilter.filter
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi
    result = await app(self.scope, self.receive, self.send)
File "/opt/conda/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/fastapi/applications.py", line 140, in __call__
    await super().__call__(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/applications.py", line 134, in __call__
    await self.error_middleware(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 178, in __call__
    raise exc from None
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 156, in __call__
    await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/cors.py", line 140, in simple_response
    await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
File "/opt/conda/lib/python3.7/site-packages/prodigy/app.py", line 183, in reset_db_middleware
    response = await call_next(request)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
File "/opt/conda/lib/python3.7/asyncio/futures.py", line 181, in result
    raise self._exception
File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 251, in __step
    result = coro.throw(exc)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/exceptions.py", line 73, in __call__
    raise exc from None
File "/opt/conda/lib/python3.7/site-packages/starlette/exceptions.py", line 62, in __call__
    await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 590, in __call__
    await route(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 208, in __call__
    await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 129, in app
    raw_response = await run_in_threadpool(dependant.call, **values)
File "/opt/conda/lib/python3.7/site-packages/starlette/concurrency.py", line 25, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
File "/opt/conda/lib/python3.7/asyncio/futures.py", line 263, in __await__
    yield self  # This tells Task to wait for completion.
File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 318, in __wakeup
    future.result()
File "/opt/conda/lib/python3.7/asyncio/futures.py", line 181, in result
    raise self._exception
File "/opt/conda/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
File "/opt/conda/lib/python3.7/site-packages/prodigy/app.py", line 399, in get_session_questions
    return _shared_get_questions(req.session_id, excludes=req.excludes)
File "/opt/conda/lib/python3.7/site-packages/prodigy/app.py", line 370, in _shared_get_questions
    tasks = controller.get_questions(session_id=session_id, excludes=excludes)
File "cython_src/prodigy/core.pyx", line 138, in prodigy.core.Controller.get_questions
File "cython_src/prodigy/components/feeds.pyx", line 68, in prodigy.components.feeds.SharedFeed.get_questions
File "cython_src/prodigy/components/feeds.pyx", line 78, in prodigy.components.feeds.SharedFeed.get_next_batch
File "cython_src/prodigy/components/feeds.pyx", line 79, in prodigy.components.feeds.SharedFeed.get_next_batch
File "/root/src/prodigy/utils.py", line 140, in get_stream
    for e in stream:
File "cython_src/prodigy/components/sorters.pyx", line 98, in __iter__
File "cython_src/prodigy/components/sorters.pyx", line 29, in genexpr
File "/root/src/prodigy/utils.py", line 248, in score_examples
    for score, example in model(stream):
File "/root/src/prodigy/utils.py", line 96, in __call__
    for eg in stream:
File "/root/src/prodigy/utils.py", line 220, in filter_existing
    for e in stream:
File "cython_src/prodigy/components/filters.pyx", line 37, in filter_duplicates
File "/root/src/prodigy/utils.py", line 241, in set_hashes
    e['_task_hash'] = xxh32_intdigest(dataset.lower() + '-' + e['digest'] + '-' + e['label'].lower())
KeyError: 'label'

@tiangolo, @ines please let me know if I can help resolve this. Also, maybe it’s worth removing the solved tag to better reflect that this issue persists?

Okay, so if I understand this correctly, we did find out that the underlying problem here is that something in the async stack swallows the exception and it only surfaces as an INFO log, right?

Yes, I think that's what's going on.
However the exception doesn't even surface as an INFO log message. That INFO message seen above comes from a load balancer, not anything in Prodigy. I could verify that by running Prodigy in Docker without a load balancer and nothing showed up in the logs. Rather the exception only surfaces once Prodigy is exited.