Hello, I have been trying to deploy Prodigy on Heroku but couldn't achieve it yet. I'm a bit inexperienced in terms of web application and cloud deployment. Is there a specific guide on how to deploy Prodigy on Heroku, or it's not possible? Found some posts about using a Docker file but don't want that way. Can someone who deployed Prodigy on Cloud, especially on Heroku before give me a hand on this?
Hi! In general, you can deploy Prodigy like any other Python app that starts a web server. I haven't really used Heroku myself, but I found this guide, which looks pretty straightforward:
In your case, the command and Python script you run would be the prodigy
command to start the Prodigy server. You just need to make sure that you also upload the Prodigy wheel and specify it in your requirements.txt
, so it can be installed on the server: Python Dependencies via Pip | Heroku Dev Center If you need to configure the host and port to run Prodigy on, you can set the PRODIGY_HOST
and PRODIGY_PORT
environment variables.
I do think that if you're just starting out and haven't done much cloud deployment, Docker could be a good option? It'll take care of setting up the environment for you, so you won't have to worry about any of that. Here's a Dockerfile that might help: Cloud deploy dockerfile
I tried the tutorial "How to Deploy a Python Script or Bot to Heroku in 5 Minutes". But I didn't succeed to start it.
requirements.txt (license changed)
--extra-index-url https://1111-22AB-3344-55CD@download.prodi.gy/index
index prodigy>=1.11.0,<2.0.0
Procfile
web: prodigy mark.py
worker: prodigy mark.py
file structure in GitHub
βββ Procfile
βββ requirements.txt
βββ mark.py
βββ images.jsonl
I get the error:
Can't find recipe or command 'run'
My question would be, how do I run the recipe with my specifications, like I would in the command window. For example, how would I start this recipe on my cloud application?
$prodigy mark fing_lens_images ./images.jsonl --loader jsonl --label GOOD_IMAGE --view-id classification
Do I have to run from within a Python file, something like os.system("prodigy mark...")
?
Additionally, do I have to specify the port / location somewhere? Something like this, with a Flask example:
if __name__ == "__main__":
port = int(os.environ.get("PORT", 5000))
app.run(host='0.0.0.0', port=port)
Best, Matt
Is the Procfile supposed to include the command to run to start the server? In that case, I think you want to put the recipe call in there, e.g. prodigy mark fing_lens_images ...
and so on.
Alternatively, you can also call into prodigy.serve
from Python: Components and Functions Β· Prodigy Β· An annotation tool for AI, Machine Learning & NLP
You can specify the host and port via keyword arguments here, or put it in your prodigy.json
, or define it via the environment variables PRODIGY_HOST
and PRODIGY_PORT
.
Yes, my procfile was wrong. Meanwhile I was able to get prodigy started.
I also added the prodigy.json
:
{
...
"port": 80,
"host": "0.0.0.0",
...
}
Although I'm able to start prodigy now, I don't get the prodigy webinterface on the provided link of the heroku app (e.g. https://myappname.herokuapp.com ). I get the message, that the webserver is launched.
Starting the webserver at http://0.0.0.0:80
I can change the PORT / HOST variables in the environment. Also using the PORT Heroku provides as some suggested via PRODIGY_PORT=$PORT
does not change the outcome.
Heroku logs:
heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/"
host=myappname.herokuapp.com request_id=e0dbbb...
fwd="xx.xx.xxx.xxx" dyno= connect= service= status=503 bytes= protocol=https
Sine I am not very familiar with cloud deployment, could it be, that Heroku might be less applicable for prodigy?
Something I found:
Heroku vs Docker
Environment: One of the most important differences between Heroku and Docker is that Heroku must run in its own cloud environment, while Docker can run in an environment of your choiceβwhether that's your laptop, a remote server, or a public cloud service like Amazon Web Services (AWS).
(Heroku worked with Flask when I used gunicorn.)
Meanwhile I tried a lot of things, but it didn't work.
Could you recommend me a tutorial or documentation where I find some hints, how to solve this.
Since prodigy hasn't a specific "file.py" which I start, it is different to the deployment of Flask / Django (which works).
Thanks, Matt
Hi @Matt2021 !
Just to make sure that we've covered all bases, can you try the following (in order of importance):
-
Go to the Heroku settings of your app, then Config Vars, and set
WEB_CONCURRENCY
to the value of1
. Heroku seems to default to2
(for free tier). And the number of workers needed for Prodigy is just 1. -
Create a file in your project root,
main.py
, and call theprodigy.serve
command there. Here's a sample of what it looks like:
import prodigy
import os
# We should use Heroku's port,not the default version
port = int(os.environ.get("PORT", 8080))
# We should bind to this host, not "localhost"
host = "0.0.0.0"
if __name__ == "__main__":
prodigy.serve(
"<TODO>", # e.g. ner.manual test ...
host=host,
port=port
)
Then in your Procfile
you should add:
web: python main.py
I am pretty sure that your original approach (using prodigy.json
and supplying the prodigy command in the Procfile directly) will still work. But just sharing what has worked for me.
- If you're still debugging, I also recommend turning on the logs. Although be careful because this might expose any sensitive data you have, especially if your ports are exposed. See: https://prodi.gy/docs/install#debugging-logging
Again, go to the Heroku settings of your app, then Config Vars, and set PRODIGY_LOGGING
to verbose
Hello @ljvmiranda921, thank you very much for your help! It worked and displayed the prodigy interface.
Could you give me some further guidance in relation to the database? My goal is to start different sessions for different users, who can annotate images indepedently. Therefore I have to rely on a postgresql database.
So far I have activated the postgres database in Heroku and changed the specific part in the prodigy.json
:
{
"db": "postgresql",
"db_settings": {
"postgresql": {
"dbname": "prodigy",
"user": "username-given-by-heroku",
"password": "password-given-by-heroku"
}
}
}
But with this change made, I can't access the app anymore. My question would be, do I have to add import psycop2
and establish a connection with psycop2 inside the main.py?
Or do I have to use something similar to "environ.get", because Herokus says:
"Database Credentials: Please note that these credentials are not permanent. Heroku rotates credentials periodically and updates applications where this database is attached."
Best, Matt
Hi @Matt2021 !
Glad it worked
My question would be, do I have to add
import psycop2
and establish a connection with psycop2 inside the main.py?
For the database, you just need to ensure that the driver is installed with the app. Prodigy just needs the driver. Can you check through the logs if it's connecting properly? You can test if there's a connection by running the script here: Database Β· Prodigy Β· An annotation tool for AI, Machine Learning & NLP
"Database Credentials: Please note that these credentials are not permanent . Heroku rotates credentials periodically and updates applications where this database is attached."
Perhaps it's similar to how $PORT
works , if that's the case, you can try setting the environment variables for Postgres, similar to here: PostgreSQL: Documentation: 16: 34.15. Environment Variables
My goal is to start different sessions for different users, who can annotate images indepedently.
Another option is to still use SQLite with a file on disk. You just need to ensure that the database isn't wiped whenever the app restarts.
Thank you, the hint to the Environment Variables solved it. I entered these variables DATABASE_URL, PGHOST, PGPASSWORD, PGPORT, PGUSER into the Config Vars of Heroku, which worked.
One last question concerning the use of a custom recipe.
I changed the prodigy.serve command in the main.py
:
"image-caption-loop data_testset ./load_images.jsonl ./mark_loop.py"
As I read in the posts, I checked the dash of -F, and tested it with and without -F, as well with and without .py-ending ( prodigy.serve does not work with custom recipe ).
I get the error: "β Can't find recipe 'image-caption-loop
".
The recipe in mark_loop.py
looks like this:
@prodigy.recipe(
"image-caption-loop",
dataset=("The dataset to save to", "positional", None, str),
file_path=("Path to images", "positional", None, str),
)
def image_caption_loop(dataset, file_path):
#blocks of the interface
blocks = [
{"view_id": "classification"}
]
def get_stream():
#stream = JSONL(file_path) # load in the JSONL file
for label in ["FIRST_LABEL", "SECOND_LABEL"]:
examples = JSONL(file_path) #enter path with executing the recipe like ./img
for eg in examples:
eg["label"] = label
yield eg
return {
"dataset": dataset,
"stream": get_stream(),
"view_id": "blocks",
"config": {"blocks": blocks}
}
Do I have to specify the database somehow in the recipe?
Thanks again for your help!
Hi @Matt2021 ,
Glad it worked!
Just a sanity-check, are we sure the mark_loop.py
is being uploaded in the Heroku instance?
Also, does this work locally? For the former, you can check your files by running:
heroku run bash
ls .
I don't think you need to specify the database.
The mark_loop.py
file is uploaded in the Heroku instance, as I checked again.
Also, I can run the custom recipe directly on my local computer with: python -m prodigy image_caption_loop data_testset ./load_images.jsonl -F mark_loop.py
But, if I started the custom recipe via the main.py
locally it generates the same error: β Can't find recipe 'image-caption-loop
. Nevertheless I can start one of prebuilt prodigy recipes locally via the main.py, like: image.manual images_dataset ...
, which gets found and runs.
Solution
I saw this post suggesting to add the custom recipe to the main.py
with the serve command: prodigy.serve does not work with custom recipe - #2 by ines - which works
So I will test it, to see, if the everything works with the database.
Once again, thank you @ljvmiranda921 for your help!
Edit - solved
Is there a workaround to use the command "--remove-base64"? Because it also doesn't get recognized within the prodigy.serve, when I use the custom recipe.
This post with a function def before_db(examples)
solved it: Labelling a set of images (classification) - #3 by strickvl
Hi @ljvmiranda921 ,
After annotating on Heroku, how do I pull the annotations to my local computer?
On my Heroku app, I have the following saved annotations to "main-db" dataset.
However, when I run heroku run prodigy stats -ls
I get the following results
============================== β¨ Prodigy Stats ==============================
Version 1.11.6
Location /app/.heroku/python/lib/python3.10/site-packages/prodigy
Prodigy Home /app/.prodigy
Platform Linux-4.4.0-1101-aws-x86_64-with-glibc2.31
Python Version 3.10.4
Database Name SQLite
Database Id sqlite
Total Datasets 0
Total Sessions 0
Hmm, it's quite unusual that the number of total datasets aren't registering in the app. To be sure, you can probably download the database file itself from /app/.prodigy/prodigy.db
. Is main-db
a SQLite database or did you configure something on Heroku to use a different backend?
I'm not well-versed with Heroku, but I remember that you can use something like ps:copy to pull files from a Dyno server.
@vinitrinh Hello! Were you able to resolve this issue of seeing 0 datasets/sessions when running heroku run prodigy stats
?
Hello all. I have the same problem as @vinitrinh. After deploying Prodigy to Heroku (thanks to @Matt2021 and @ljvmiranda921 ) I still cannot make the database work. Everything seems to work fine and annotations seem to be saved to the custom dataset but when I run heroku run prodigy stats
I get 0 databases:
My main file:
My prodigy.json file which I added to the /app folder (I tried to override the default prodigy.json - not sure if it was a good I idea to customize it this way with another copy in the app folder):
@miladrogha in the future, please refrain from posting screenshots. These are impossible to copy/paste, often harder to read and they won't be indexed by search engines.
Tagging team members on the forum also won't guarantee that they'll be able to respond. We're a team that's handling the question and the person who responds depends on availability.
I'm not that familiar with the Heroku platform, but I wonder if you've deployed Prodigy as a serverless service. If so, the state may be lost after a while because the containers can spin up or down. Since SQlite is typically stored on disk, you may need to configure a postgresql database hosted by Heroku instead.
Thanks @koaning . Sorry for the confusion.
So I switched to using Heroku's Postgresql. The only problem is with the data format stored in the database. that the data format in the content section of Examples table is weird:
\x7b2274657874223a22447572696e6720323032312077652067656e65726174656420726576656e7565206f662024362e322062696c6c696f6e2c2075702034252066726f6d20323032302e222c225f696e7
I was able to resolve the issue. It seems like it is just the way that PostgreSQL stored the data. To access the data from your local terminal you can use db-out
and write:
heroku run prodigy db-out name-of-your-dataset > <output-path> --dry
For example:
prodigy db-out Db1 > ~/res.jsonl --dry
This stores the annotations in the dataset (for example "Db1") in a jsonl format which you can use easily later.
More on the raw (it is actually "bytea" data type ) : bytea type
Hi @miladrogha! Thanks for posting your solution! Let us know if you have any further questions.