Run Prodigy with a remote MySql database using Docker

I've just wrote a Medium article on how to run Prodigy with a connection to a remote MySQL database as a Docker container. I thought I share the Dockerfile here for others to check. This will make it very easy to deploy Prodigy as a Web App service (which I will cover soon)

# https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

FROM python:3.7

COPY requirements.txt /app/
WORKDIR /app

RUN pip install --upgrade pip \
    && pip install --trusted-host pypi.python.org -r requirements.txt

COPY wheel/prodigy-xxx-linux_x86_64.whl ./wheel/
RUN pip install wheel/prodigy-xxx-linux_x86_64.whl \
    && rm -rf wheel/prodigy-xxx-linux_x86_64.whl
RUN python -m spacy download en_core_web_md

COPY prodigy.json .
COPY data ./data/

ENV PRODIGY_HOME /app
ENV PRODIGY_LOGGING "verbose"
ENV PRODIGY_ALLOWED_SESSIONS "user1,user2"
ENV PRODIGY_BASIC_AUTH_USER "admin"
ENV PRODIGY_BASIC_AUTH_PASS "password"

EXPOSE 80

CMD python -m prodigy ner.manual ner_news en_core_web_md ./data/dataset.jsonl --label PERSON,ORG,PRODUCT
3 Likes

Awesome, thank you so much for sharing! :raised_hands::tada:

Hi @DrGabrielHarris thanks for the tutorial. I've followed your instruction to set up a remote Mysql database. One issue I am facing is I can't see the annotation in the database, even the remote db is successfully connected without any error and I could tell the annotations are saved in the db from the log. Do you have any ideas about what goes wrong here? Thanks.

Hi Dr Garbriel Harris,

I hope you are well, when trying to build a similar docker :

FROM python:3.8

COPY requirements.txt /app/
WORKDIR /app

RUN python -m pip install --upgrade pip
RUN pip install --upgrade pip
&& pip install --trusted-host pypi.python.org -r requirements.txt
RUN python -m pip install --upgrade pip setuptools wheel

COPY wheel/prodigy-1.11.0a11-cp38-cp38-linux_x86_64.whl ./wheel/
RUN pip install wheel/*.whl

RUN python -m spacy download en_core_web_md

COPY prodigy.json .
COPY data ./data/

ENV PRODIGY_HOME /app
ENV PRODIGY_LOGGING "verbose"
ENV PRODIGY_ALLOWED_SESSIONS "luca,francesco,andrea,vincenzo,francesco,carlos,ryan,jack,adrian,bill"
ENV PRODIGY_BASIC_AUTH_USER "admin"
ENV PRODIGY_BASIC_AUTH_PASS "belegendary"

EXPOSE 80

CMD python -m prodigy ner.manual ner_news en_core_web_md ./data/combined_data_mixed.jsonl --label ORG,PERSON,LOCATION

I in return get the following error :

Removing intermediate container c8c746f62306
---> dbd4d5cef213
Step 7/18 : COPY wheel/prodigy-1.11.0a11-cp38-cp38-linux_x86_64.whl ./wheel/
---> b22ce8a15ec4
Step 8/18 : RUN pip install wheel/.whl
---> Running in 86c9423ed2de
ERROR: prodigy-1.11.0a11-cp38-cp38-linux_x86_64.whl is not a supported wheel on this platform.
The command '/bin/sh -c pip install wheel/
.whl' returned a non-zero code: 1

You know what could be going incorrectly?

Kind regards,
Adrian Arranz

Maybe double-check that this file exists and is copied correctly, and that it's the correct wheel that matches the Python version and OS of your container.

Got it thank you very much :smiley:

I got another slight problem when trying to run the docker. I made sure that the MySQL database is running and everything seemed perfect. I think it's an authentication error, but I'm confused since I supplied it with the username and password in the prodigy.json file to access the database. Here is my command:

 % docker run --name prodigy -p 3306:80 prodigy-webapp:1.0.0

and here is the error:

  File "/usr/local/lib/python3.9/site-packages/pymysql/connections.py", line 664, in connect
    raise exc
peewee.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 111] Connection refused)")
Using 3 label(s): ORG, PERSON, LOCATION

Here is my prodigy.jsonl file :

{
    "port": 80,
    "host": "0.0.0.0",
    "db": "mysql",
    "db_settings": {
        "mysql": {
            "user": "root",
            "password": "#########",
            "host": "127.0.0.1",
            "port": 3306,
            "database": "prodigydb"
        }
    },
    "feed_overlap": false,
    "show_stats": true
}

Here is my Dockerfile :

FROM python:3.9

COPY requirements.txt /app/
WORKDIR /app

RUN python -m pip install --upgrade pip
RUN pip install --upgrade pip \
    && pip install --trusted-host pypi.python.org -r requirements.txt
RUN python -m pip install --upgrade pip setuptools wheel

COPY wheel/prodigy-1.11.0a11-cp39-cp39-linux_aarch64.whl ./wheel/
RUN pip install wheel/prodigy-1.11.0a11-cp39-cp39-linux_aarch64.whl
RUN pip install spacy

RUN python -m spacy download en_core_web_md


COPY prodigy.json .
COPY data ./data/

ENV PRODIGY_HOME /app
ENV PRODIGY_LOGGING "verbose"
ENV PRODIGY_ALLOWED_SESSIONS "user1,user2"
ENV PRODIGY_BASIC_AUTH_USER "admin"
ENV PRODIGY_BASIC_AUTH_PASS "1234"

EXPOSE 80

CMD python -m prodigy ner.manual ner_news en_core_web_md ./data/combined_data_mixed.jsonl --label ORG,PERSON,LOCATION

Does it work outside of Docker? It sounds like for some reason, peewee can't connect to the MySQL database inside the Docker container. To make debugging easier, you could try and just add a simple script that calls peewee's MySQLDatabase directly with your settings: Database — peewee 3.14.4 documentation

If that works, Prodigy should be able to connect to your DB as well. If this turns up the same problem, you can debug it independently. I also found this StackOverflow thread where the comments suggest checking the allowed hosts and whether the Python driver is compatible with the version of MySQL.

Yes MySQL works with prodigy outside the docker. I build the image and it works perfectly however, when I try to run the image it, it returns me that error. I will try that out thank you. I will notify you if anything happens :slight_smile: