Unable to launch prodigy.serve through Apache

My use case is that I want to host prodigy though my hosted Debian instance so that I can classify data from iPad/phone.

After reading this forum, it seems other users are succesful at this but I haven't been able to achieve this.

As of right now, I have the following code:

run_server.wsgi

from start_server import run_prodigy_server
import start_server

start_server.py

import prodigy
import os
from common.config import app_root

model = 'ner.teach'
dataset_name = 'dataset_test'
json_file = 'news_headlines.jsonl'
labels = ['PERSON', 'ORG']
host = '0.0.0.0'
port = 8081
full_path = os.path.join(app_root(), *['ml', 'aiprodigy'], json_file)

prodigy.serve(model, dataset_name, 'en_core_web_sm', full_path, None, None, labels, None, port=port)

This is then run through Apache WSGI. However, I run into the following error in my logs:

mod_wsgi (pid=30165): Exception occurred processing WSGI script '/home/user/project/dev/ml/aiprodigy/run_server.wsgi'.
 Traceback (most recent call last):
   File "/home/user/project/dev/ml/aiprodigy/run_server.wsgi", line 7, in <module>
     from start_server import run_prodigy_server
   File "/home/user/project/dev/ml/aiprodigy/start_server.py", line 19, in <module>
     None, None, labels, host=host, port=port)
   File "/home/user/.local/lib/python3.7/site-packages/prodigy/__init__.py", line 46, in serve
     server(controller, controller.config)
   File "/home/user/.local/lib/python3.7/site-packages/prodigy/app.py", line 476, in server
     log_config=_uvicorn_log_config,
   File "/home/user/.local/lib/python3.7/site-packages/uvicorn/main.py", line 346, in run
     server.run()
   File "/home/user/.local/lib/python3.7/site-packages/uvicorn/main.py", line 374, in run
     loop.run_until_complete(self.serve(sockets=sockets))
   File "uvloop/loop.pyx", line 1450, in uvloop.loop.Loop.run_until_complete
   File "uvloop/loop.pyx", line 1443, in uvloop.loop.Loop.run_until_complete
   File "uvloop/loop.pyx", line 1351, in uvloop.loop.Loop.run_forever
   File "uvloop/loop.pyx", line 497, in uvloop.loop.Loop._run
   File "uvloop/loop.pyx", line 281, in uvloop.loop.Loop._setup_or_resume_signals
   File "uvloop/loop.pyx", line 270, in uvloop.loop.Loop._setup_or_resume_signals
   File "uvloop/loop.pyx", line 3238, in uvloop.loop._set_signal_wakeup_fd
 ValueError: set_wakeup_fd only works in main thread

This might be due to the WSGI link in some way. Any clue how to fix this? Alternatively, is there a better method to host Prodigy (e.g. through ASGI)

I have a very similar problem. I've created a Flask-based wrapper app, because I wanted my users to be able to submit unlabelled examples and label them in the same interface. Long story short, in my Flask app I have a before_first_request function that starts a new thread in which I call prodigy.serve. This works fine with Prodigy v1.8.5, but fails in v1.9.4 with the error message

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/Users/44097208/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/44097208/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/44097208/Development/gmml-simple-annotator/src/ui.py", line 76, in thread_function
    prodigy.serve("custom.ner.manual test en_core_web_sm http://localhost:8000/annotation-app-queue/example")
  File "/Users/44097208/Development/gmml-simple-annotator/venv/lib/python3.6/site-packages/prodigy/__init__.py", line 38, in serve
    server(controller, controller.config)
  File "/Users/44097208/Development/gmml-simple-annotator/venv/lib/python3.6/site-packages/prodigy/app.py", line 476, in server
    log_config=_uvicorn_log_config,
  File "/Users/44097208/Development/gmml-simple-annotator/venv/lib/python3.6/site-packages/uvicorn/main.py", line 346, in run
    server.run()
  File "/Users/44097208/Development/gmml-simple-annotator/venv/lib/python3.6/site-packages/uvicorn/main.py", line 373, in run
    loop = asyncio.get_event_loop()
  File "/Users/44097208/anaconda3/lib/python3.6/asyncio/events.py", line 694, in get_event_loop
    return get_event_loop_policy().get_event_loop()
  File "/Users/44097208/anaconda3/lib/python3.6/asyncio/events.py", line 602, in get_event_loop
    % threading.current_thread().name)
RuntimeError: There is no current event loop in thread 'Thread-4'.

Any way I can make this setup work with v1.9.x ? Some Googling suggested I add asyncio.set_event_loop(asyncio.new_event_loop()) before calling prodigy.serve, but that didn't work.

Thanks for the report @freefall and @einarbmag!

I'll check it soon. Meanwhile, here's the summary of the current state and the next things to try.

The latest version of Prodigy uses FastAPI with Uvicorn. That runs with ASGI instead of WSGI.

Uvicorn has a setting to use WSGI: https://www.uvicorn.org/settings/#application-interface, I still have to check how that would interact with the rest of the components.

The next option is that Uvicorn also has a way to run through Gunicorn, and that might also allow using WSGI through Gunicorn.

Meanwhile, Prodigy comes with the uncompiled source for app.py, that's where the API lives, and it has an explicit call to uvicorn.run(), so if you want, you can also take a look and see if you find a solution before I do.

Checking again the comments in this thread, I think I was overlooking that there's probably a better/simpler way to solve both cases.

@freefall for your case, as the latest versions of Prodigy run with Uvicorn, there's no real need for Apache, Nginx or anything similar on top, as the framework and server (Uvicorn) are async themselves. You can just run it directly and make it listen on the chosen host/IP and port. If you do need to have Apache on top for some other reason, you can instead use a ProxyPass: https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypass

@einarbmag for your use case, the closest option to what you are doing now would be to use a subprocess instead of a subthread. This is because it's async code underneath that runs sync code in a threadpool. Nevertheless, it would probably be better to start Prodigy as a separate process that doesn't depend on your Flask application and manage it independently, it would probably simplify configuration and management.

If these ideas don't work for you, let us know to help you find the best way to approach it for each specific case.

1 Like

Thanks @tiangolo, I will have a look at that. The lure of wrapping everything in a Flask-based app is that I can create a data queue API before spinning up the Prodigy process, all in one go (has to happen in that order, or Prodigy crashes).

Get it, then probably a subprocess called from Flask could do it for your use case :crossed_fingers:

Thanks @tiangolo, I will get back to this as I've been too busy with other parts of Prodigy, and focusing more on the quick iteration. However, will get back to this later so this is very helpful. Appreciate it.

1 Like

I was getting the same error as @einarbmag when trying to start a subprocess with Prodigy on a FastAPI application on a Docker container (thread here). I was able to fix it by assigning the workers argument to 2 when starting my FastAPI app within Docker.