Deploying Prodigy on Cloud Platform (Heroku)

metep · June 18, 2021, 8:25am

Hello, I have been trying to deploy Prodigy on Heroku but couldn't achieve it yet. I'm a bit inexperienced in terms of web application and cloud deployment. Is there a specific guide on how to deploy Prodigy on Heroku, or it's not possible? Found some posts about using a Docker file but don't want that way. Can someone who deployed Prodigy on Cloud, especially on Heroku before give me a hand on this?

ines · June 20, 2021, 1:21am

Hi! In general, you can deploy Prodigy like any other Python app that starts a web server. I haven't really used Heroku myself, but I found this guide, which looks pretty straightforward:

In your case, the command and Python script you run would be the prodigy command to start the Prodigy server. You just need to make sure that you also upload the Prodigy wheel and specify it in your requirements.txt, so it can be installed on the server: Python Dependencies via Pip | Heroku Dev Center If you need to configure the host and port to run Prodigy on, you can set the PRODIGY_HOST and PRODIGY_PORT environment variables.

I do think that if you're just starting out and haven't done much cloud deployment, Docker could be a good option? It'll take care of setting up the environment for you, so you won't have to worry about any of that. Here's a Dockerfile that might help: Cloud deploy dockerfile

Matt2021 · October 12, 2021, 3:16pm

I tried the tutorial "How to Deploy a Python Script or Bot to Heroku in 5 Minutes". But I didn't succeed to start it.

requirements.txt (license changed)

--extra-index-url https://1111-22AB-3344-55CD@download.prodi.gy/index 
index prodigy>=1.11.0,<2.0.0

Procfile

web: prodigy mark.py
worker: prodigy mark.py

file structure in GitHub
├── Procfile
├── requirements.txt
├── mark.py
└── images.jsonl

I get the error:
Can't find recipe or command 'run'

My question would be, how do I run the recipe with my specifications, like I would in the command window. For example, how would I start this recipe on my cloud application?
$prodigy mark fing_lens_images ./images.jsonl --loader jsonl --label GOOD_IMAGE --view-id classification

Do I have to run from within a Python file, something like os.system("prodigy mark...")?

Additionally, do I have to specify the port / location somewhere? Something like this, with a Flask example:

if __name__ == "__main__":
    port = int(os.environ.get("PORT", 5000))
    app.run(host='0.0.0.0', port=port)

Best, Matt

ines · October 14, 2021, 11:20am

Is the Procfile supposed to include the command to run to start the server? In that case, I think you want to put the recipe call in there, e.g. prodigy mark fing_lens_images ... and so on.

Alternatively, you can also call into prodigy.serve from Python: Components and Functions · Prodigy · An annotation tool for AI, Machine Learning & NLP

You can specify the host and port via keyword arguments here, or put it in your prodigy.json, or define it via the environment variables PRODIGY_HOST and PRODIGY_PORT.

Matt2021 · October 14, 2021, 12:47pm

Yes, my procfile was wrong. Meanwhile I was able to get prodigy started.
I also added the prodigy.json :

{
...
  "port": 80,
  "host": "0.0.0.0",
...
}

Although I'm able to start prodigy now, I don't get the prodigy webinterface on the provided link of the heroku app (e.g. https://myappname.herokuapp.com ). I get the message, that the webserver is launched.
Starting the webserver at http://0.0.0.0:80

I can change the PORT / HOST variables in the environment. Also using the PORT Heroku provides as some suggested via PRODIGY_PORT=$PORT does not change the outcome.

Heroku logs:
heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/"
host=myappname.herokuapp.com request_id=e0dbbb...
fwd="xx.xx.xxx.xxx" dyno= connect= service= status=503 bytes= protocol=https

Sine I am not very familiar with cloud deployment, could it be, that Heroku might be less applicable for prodigy?
Something I found:

Heroku vs Docker
Environment: One of the most important differences between Heroku and Docker is that Heroku must run in its own cloud environment, while Docker can run in an environment of your choice—whether that's your laptop, a remote server, or a public cloud service like Amazon Web Services (AWS).

(Heroku worked with Flask when I used gunicorn.)

Matt2021 · October 25, 2021, 6:04pm

Meanwhile I tried a lot of things, but it didn't work.
Could you recommend me a tutorial or documentation where I find some hints, how to solve this.

Since prodigy hasn't a specific "file.py" which I start, it is different to the deployment of Flask / Django (which works).

Thanks, Matt

ljvmiranda921 · October 27, 2021, 8:33am

Hi @Matt2021 !

Just to make sure that we've covered all bases, can you try the following (in order of importance):

Go to the Heroku settings of your app, then Config Vars, and set WEB_CONCURRENCY to the value of 1 . Heroku seems to default to 2 (for free tier). And the number of workers needed for Prodigy is just 1.
Create a file in your project root, main.py , and call the prodigy.serve command there. Here's a sample of what it looks like:

import prodigy 
import os 

# We should use Heroku's port,not the default version 
port = int(os.environ.get("PORT", 8080)) 
# We should bind to this host, not "localhost" 
host = "0.0.0.0" 

if __name__ == "__main__": 
    prodigy.serve(
        "<TODO>",  # e.g. ner.manual test ...
        host=host, 
        port=port
    )

Then in your Procfile you should add:

web: python main.py

I am pretty sure that your original approach (using prodigy.json and supplying the prodigy command in the Procfile directly) will still work. But just sharing what has worked for me.

If you're still debugging, I also recommend turning on the logs. Although be careful because this might expose any sensitive data you have, especially if your ports are exposed. See: https://prodi.gy/docs/install#debugging-logging

Again, go to the Heroku settings of your app, then Config Vars, and set PRODIGY_LOGGING to verbose

Matt2021 · November 3, 2021, 11:18am

Hello @ljvmiranda921, thank you very much for your help! It worked and displayed the prodigy interface.

Could you give me some further guidance in relation to the database? My goal is to start different sessions for different users, who can annotate images indepedently. Therefore I have to rely on a postgresql database.

So far I have activated the postgres database in Heroku and changed the specific part in the prodigy.json :

{
  "db": "postgresql",
  "db_settings": {
    "postgresql": {
      "dbname": "prodigy",
      "user": "username-given-by-heroku",
      "password": "password-given-by-heroku"
    }
  }
}

But with this change made, I can't access the app anymore. My question would be, do I have to add import psycop2 and establish a connection with psycop2 inside the main.py?

Or do I have to use something similar to "environ.get", because Herokus says:
"Database Credentials: Please note that these credentials are not permanent. Heroku rotates credentials periodically and updates applications where this database is attached."

Best, Matt

ljvmiranda921 · November 4, 2021, 1:14am

Hi @Matt2021 !

Glad it worked

My question would be, do I have to add import psycop2 and establish a connection with psycop2 inside the main.py?

For the database, you just need to ensure that the driver is installed with the app. Prodigy just needs the driver. Can you check through the logs if it's connecting properly? You can test if there's a connection by running the script here: Database · Prodigy · An annotation tool for AI, Machine Learning & NLP

"Database Credentials: Please note that these credentials are not permanent . Heroku rotates credentials periodically and updates applications where this database is attached."

Perhaps it's similar to how $PORT works , if that's the case, you can try setting the environment variables for Postgres, similar to here: PostgreSQL: Documentation: 16: 34.15. Environment Variables

My goal is to start different sessions for different users, who can annotate images indepedently.

Another option is to still use SQLite with a file on disk. You just need to ensure that the database isn't wiped whenever the app restarts.

Matt2021 · November 8, 2021, 3:30pm

Thank you, the hint to the Environment Variables solved it. I entered these variables DATABASE_URL, PGHOST, PGPASSWORD, PGPORT, PGUSER into the Config Vars of Heroku, which worked.

One last question concerning the use of a custom recipe.
I changed the prodigy.serve command in the main.py :

"image-caption-loop data_testset ./load_images.jsonl ./mark_loop.py"

As I read in the posts, I checked the dash of -F, and tested it with and without -F, as well with and without .py-ending ( prodigy.serve does not work with custom recipe ).

I get the error: "✘ Can't find recipe 'image-caption-loop".

The recipe in mark_loop.py looks like this:

@prodigy.recipe(
    "image-caption-loop",
    dataset=("The dataset to save to", "positional", None, str),
    file_path=("Path to images", "positional", None, str),
)

def image_caption_loop(dataset, file_path):
    #blocks of the interface
    blocks = [
        {"view_id": "classification"}
    ]

    def get_stream():
        #stream = JSONL(file_path)     # load in the JSONL file
        for label in ["FIRST_LABEL", "SECOND_LABEL"]:
            examples = JSONL(file_path)          #enter path with executing the recipe like ./img
            for eg in examples:
                eg["label"] = label
                yield eg

    return {
        "dataset": dataset,
        "stream": get_stream(),
        "view_id": "blocks",
        "config": {"blocks": blocks}
    }

Do I have to specify the database somehow in the recipe?
Thanks again for your help!

ljvmiranda921 · November 9, 2021, 12:16am

Hi @Matt2021 ,

Glad it worked!

Just a sanity-check, are we sure the mark_loop.py is being uploaded in the Heroku instance?
Also, does this work locally? For the former, you can check your files by running:

heroku run bash
ls .

I don't think you need to specify the database.

Matt2021 · November 9, 2021, 9:56am

The mark_loop.py file is uploaded in the Heroku instance, as I checked again.
Also, I can run the custom recipe directly on my local computer with: python -m prodigy image_caption_loop data_testset ./load_images.jsonl -F mark_loop.py

But, if I started the custom recipe via the main.py locally it generates the same error: ✘ Can't find recipe 'image-caption-loop. Nevertheless I can start one of prebuilt prodigy recipes locally via the main.py, like: image.manual images_dataset ..., which gets found and runs.

Solution
I saw this post suggesting to add the custom recipe to the main.py with the serve command: prodigy.serve does not work with custom recipe - #2 by ines - which works

So I will test it, to see, if the everything works with the database.

Once again, thank you @ljvmiranda921 for your help!

Edit - solved
Is there a workaround to use the command "--remove-base64"? Because it also doesn't get recognized within the prodigy.serve, when I use the custom recipe.

This post with a function def before_db(examples) solved it: Labelling a set of images (classification) - #3 by strickvl

vinitrinh · April 1, 2022, 8:27am

Hi @ljvmiranda921 ,

After annotating on Heroku, how do I pull the annotations to my local computer?

On my Heroku app, I have the following saved annotations to "main-db" dataset.

However, when I run heroku run prodigy stats -ls
I get the following results

============================== ✨  Prodigy Stats ==============================

Version          1.11.6                        
Location         /app/.heroku/python/lib/python3.10/site-packages/prodigy
Prodigy Home     /app/.prodigy                 
Platform         Linux-4.4.0-1101-aws-x86_64-with-glibc2.31
Python Version   3.10.4                        
Database Name    SQLite                        
Database Id      sqlite                        
Total Datasets   0                             
Total Sessions   0

ljvmiranda921 · April 5, 2022, 11:20am

Hmm, it's quite unusual that the number of total datasets aren't registering in the app. To be sure, you can probably download the database file itself from /app/.prodigy/prodigy.db. Is main-db a SQLite database or did you configure something on Heroku to use a different backend?

I'm not well-versed with Heroku, but I remember that you can use something like ps:copy to pull files from a Dyno server.

alissa · May 25, 2022, 8:04pm

@vinitrinh Hello! Were you able to resolve this issue of seeing 0 datasets/sessions when running heroku run prodigy stats?

miladrogha · June 3, 2022, 5:40am

Hello all. I have the same problem as @vinitrinh. After deploying Prodigy to Heroku (thanks to @Matt2021 and @ljvmiranda921 ) I still cannot make the database work. Everything seems to work fine and annotations seem to be saved to the custom dataset but when I run heroku run prodigy stats I get 0 databases:

My main file:

My prodigy.json file which I added to the /app folder (I tried to override the default prodigy.json - not sure if it was a good I idea to customize it this way with another copy in the app folder):

koaning · June 3, 2022, 7:01am

@miladrogha in the future, please refrain from posting screenshots. These are impossible to copy/paste, often harder to read and they won't be indexed by search engines.

Tagging team members on the forum also won't guarantee that they'll be able to respond. We're a team that's handling the question and the person who responds depends on availability.

I'm not that familiar with the Heroku platform, but I wonder if you've deployed Prodigy as a serverless service. If so, the state may be lost after a while because the containers can spin up or down. Since SQlite is typically stored on disk, you may need to configure a postgresql database hosted by Heroku instead.

miladrogha · June 3, 2022, 6:44pm

Thanks @koaning . Sorry for the confusion.

So I switched to using Heroku's Postgresql. The only problem is with the data format stored in the database. that the data format in the content section of Examples table is weird:

\x7b2274657874223a22447572696e6720323032312077652067656e65726174656420726576656e7565206f662024362e322062696c6c696f6e2c2075702034252066726f6d20323032302e222c225f696e7

miladrogha · June 6, 2022, 10:03pm

I was able to resolve the issue. It seems like it is just the way that PostgreSQL stored the data. To access the data from your local terminal you can use db-out and write:

heroku run prodigy db-out name-of-your-dataset > <output-path> --dry

For example:

prodigy db-out Db1 > ~/res.jsonl --dry

This stores the annotations in the dataset (for example "Db1") in a jsonl format which you can use easily later.

More on the raw (it is actually "bytea" data type ) : bytea type

ryanwesslen · June 7, 2022, 2:04pm

Hi @miladrogha! Thanks for posting your solution! Let us know if you have any further questions.

Topic		Replies	Views
Deploy prodigy using Kubernetes in Google Cloud usage , google-cloud	19	1192	December 5, 2022
setting prodigy in cloud Getting Started install , aws	2	975	June 2, 2021
How to Connect DB to/from Prodigy	7	385	October 19, 2023
✘ Can't find recipe or command 'serve'.	3	338	October 2, 2023
Run through python script. usage , solved	13	3842	April 3, 2019

Deploying Prodigy on Cloud Platform (Heroku)

Related topics