Deploy prodigy with Kubernetes and SQLite

smanjil · March 2, 2019, 10:00am

I am trying to deploy the prodigy instance in aws using kubernetes.

It now uses sqlite database.

It works very fine with the image created when I run it with docker run ...

But, while deploying with kube, I get this in the logs:

Added dataset dataset to database SQLite.
Traceback (most recent call last):
  File "__main__.py", line 333, in <module>
    server(controller, controller.config)
  File "/app.py", line 61, in server
    threads=1,
  File "/usr/local/lib/python3.7/site-packages/waitress/server.py", line 49, in create_server
    adj = Adjustments(**kw)
  File "/usr/local/lib/python3.7/site-packages/waitress/adjustments.py", line 295, in __init__
    setattr(self, k, self._param_map[k](v))
ValueError: invalid literal for int() with base 10: 'tcp://172.20.1.208:8080'

I am trying to figure out, what could this be? Any help would be appreciated.

honnibal · March 5, 2019, 4:27pm

I don’t have much experience with Kubernetes, but two things come to mind here:

If you use the sqlite backend, make sure you’re placing the database in a persistent volume. My guess is that if you used the default location (in the home directory), that state won’t be persistent, so you’ll lose data. I would probably switch to a different backend — I think sqlite is likely to be a poor fit for Kubernetes.
It looks to me like you’ve got some setting that expects an integer receiving the whole connection string. Perhaps you’re passing the connection string to an argument that expects only the port?

Taking a little step back here, I’m guessing that you want to have a setup where you run some command and Kubernetes launches a new Prodigy task, and you get the URL of the task, right? This workflow requires a couple of steps of indirection.

If you just launch Prodigy tasks one-by-one on your laptop, it listens on localhost and you can point your browser to localhost. But if you’re launching tasks on remote machines, you probably want a reverse proxy, which should map the localhost URLs to something you can access. And then if you’re also starting and stopping tasks under automation, you probably also want something that will keep track of all the Prodigy tasks, allocate them names, get the ports they’re listening on, and organise the mapping for your reverse proxy.

For Prodigy Scale, we’re using Nomad to launch the Prodigy tasks, with consul for service discovery. We then use Traefik as the reverse proxy, which has a neat integration with consul’s service catalog.

I’m not sure what the favoured service discovery solution is for Kubernetes. It looks like consul has a reasonable integration: https://www.consul.io/docs/platform/k8s/run.html

These are the docs for Traefik with consul catalog: https://docs.traefik.io/configuration/backends/consulcatalog/

smanjil · March 6, 2019, 9:01am

Thanks @honnibal, May be this will help!

Also, for each task allocation, kubernetes handles the port and the host for the prodigy application itself as we are supplying those information in the deployment and service files. May be it has something to do with the reverse proxy.

And, with the integer expectation, how can it be that the same code base and setting works perfectly when I run it manually or the docker image in my local PC, but not in kubernetes. And, also we are using PostgreSQL as a db.

Though, I will look for more here, and update here if I find the solution

smanjil · March 6, 2019, 10:35am

Hi,

Seems like this was a issue with kubernetes. As with deployment and service files, the app names and labels inside cannot be prodigy itself.

Seems strange, but there must be some reason behind it.

honnibal · March 6, 2019, 12:18pm

How are you informing Prodigy of the port and host? Are you using the prodigy.json, or are you setting environment variables?

I’m guessing environment variables, right? Because otherwise it would seem to me you’d need to rebuild the container image every time just to change that file?

Edit: Ah wait…You possibly don’t need to customize this in Prodigy at all? I guess Prodigy can always listen to the same port within the container…

Christian · August 27, 2019, 7:45pm

Could you elaborate on what the fix was? I'm having the same issue

Christian · September 5, 2019, 10:44am

Thanks, I made it work with the same trick on Kubernetes.

Did you manage to set up the reverse proxy? I am getting 504 Gateway Time-out with Nginx

astrajohn · April 14, 2020, 8:43am

Hi! What exactly was the secret sauce here? I'm somehow missing the key message in fixing this?

Topic		Replies	Views
port setup problems using openshift/k8s solved , docker , server	6	978	March 11, 2020
Deploy prodigy using Kubernetes in Google Cloud usage , google-cloud	19	1190	December 5, 2022
When trying to use a dataset >50k, prodigy is unable to start up? database	4	498	October 1, 2019
terms.teach: OverflowError: Python int too large to convert to SQLite INTEGER done , database , terms , solved , windows	24	7218	March 14, 2019
Hosting from a subdomain?	2	195	October 31, 2023

Deploy prodigy with Kubernetes and SQLite

Related topics