Receiving "Couldn't save annotations" error

One more issue to report: our server was restarted, and when I restarted the tasks for each annotator, I am seeing the document where the saved annotations error showed up last time, but many documents have been annotated since that one was in the queue. I don't see anything odd about this particular line in the .jsonl file, though.

I checked your settings again but they don't seem strange. The auto_exclude_current setting is set to true, which is the default value just like auto_count_stream, so that seems fine. A batch_size of 1 certainly isn't unheard of. The feed_overlap setting also seems fine, although this depends more on how you'd like your colleagues to label. It shouldn't cause an error.

I'll ping some colleagues to check if they can spot something, but in the meantime, I am curious about the logs from CloudSQL or the proxy that you use. Can you see anything on the Google Cloud side that might indicate a connection/authentication mishap?

Thank you @koaning for looking into this. Here are some more details about our setup:

  • We have a service running in a Docker container on a GCP cluster, which is the "gateway" for annotators: the annotators use it to log in and sign up for an annotation task.
  • An annotation task is identified by a unique name, a recipe, a dataset name and an input file.
  • For each unique task, we spin up a Prodigy web server inside the service container (using the prodigy.serve() command), forward the internal port to a separate service on our cluster, and redirect the annotator to this service for the given annotation task
  • Each Prodigy web server connects to the CloudSQL database via the standard proxy provided by GCP; we use the same set-up on developers' machines that run Prodigy from the terminal: each has a Docker container running the same proxy to connect to our CloudSQL instance (over VPN).
  • We see a lot of entries like this in the CloudSQL logs (on the order of 400 per 24-hour period); they always come in pairs, separated by less than 100 ms:
New connection for prodigy-annotation-db 
Client closed local connection on 127.0.0.1:5432

We assume the "Client" is the prodigy web server, which presumably uses a pool of connections that are periodically refreshed? Our own code does nothing more than provide the proxy's host name and port number to the prodigy servers.

I'm thinking out loud here, but I recall from previous work with Postgress that a maximum number of connections can cause trouble. I think the default setting is 100 connections, which I think is also what Google assumes. The Google Cloud Docs also reflect this quota here and here.

If you run prodigy.serve() for each permutation of input file/name/recipe, is it likely that you hit this limit? Does your proxy close down connections to the database after a "task" is considered completed? I'm not 100% sure if this is the issue, but it seems good to check and rule out.

Are there any logs from your proxy service that might shed some light on this?

Thanks @koaning. Some additional info:

  • we limit the number of Prodigy servers to 5, and typically not more than two are running concurrently
  • when the annotation task is complete, we shut down the Prodigy server associated with the task manually; presumably that closes all the connections
  • we haven't seen any errors in the CloudSQL proxy log that mention running out of connections or exceeding our quota

One of my colleagues suggested you may want to try out our new alpha release. Details are listed here. It feels like it might be worth a try given what you're experiencing. One of the key differences is that we're switching from peewee to SQLAlchemy as our ORM, which might address some of the database issues you're experiencing as a result. Do let me know if this does/doesn't help!

Could you also check if the error persists across browsers? We mainly want to rule out that this is a Chrome/Firefox/Safari mishap.

@koaning We are using Chrome. Testing the alpha release will possibly require a lot of reconfiguration on our part, so I think we will wait for the beta release or full release.

Understood. If you gain any more insights from any logs in the meantime, do let us know! It feels like there might be a bug here, and we're eager to address it once it's better understood.

So it turns out that our annotators may have been using Edge as their browser, but not consistently. So that is a possible issue. I have them exclusively using Chrome, and will check to see if we have any additional issues.

1 Like

That's interesting to know! Thanks for reporting back :slight_smile:

We are thinking of testing this out, as we recently saw this error again in another task and are unable to diagnose what might be causing it. What is the latest alpha version?

hi @cheyanneb!

v1.11.8a4 is the latest. We're working hard to release v1.12 soon so now could be a great time to prepare for v1.12. It includes migration script that will be required. Let us know of any issues you find!

Thanks! Is it possible get v1.11.8a4 wheels?

The wheels aren't available openly but we've sent you a follow up message with them. Anyone else who has questions can send me a direct message and I can help them get the wheels if they're interested.

We are still experiencing this error with v.1.11.11. Any updates on this? Thanks!

Bummer to hear this issue hasn't gone away.

I guess the best thing for me to do now is to try if I am able to reproduce this locally. With that in mind, is it possible to share your Dockerfile? Also, is this running a custom recipe? If so, is it possible to share that as well?

I'm likely not going to be able to mock the proxy and the CloudSQL database, but I might be able to reproduce something if I know more about the recipe/docker setup.

Perhaps a final question, does this issue perists across all annotation tasks that you give it? As in, are there recipes/situations for which this issue does not occur? Does it also occur when a single person is annotating?

I will see if I can share those files. As for the final question:

  • This is not associated with one recipe -- it has occurred in a variety of tasks calling different recipes. It seems random -- I've tried to find patterns, checked the original dataset to see if there was anything going on there, etc. And the logs do not reveal anything.
  • When I export the final dataset, the documents that received this error are missing, meaning the annotation never saved.
  • Annotators can click through, but every time they start annotating again (and they do use Chrome, and they refresh their browser, they don't start annotating again in a task/browser that was dormant), they have to click through these documents that get stuck in this save error state. So like with ignore, I could set up a new task to handle these missing docs, but it's usually just 1 or 2 and we can do them manually.
  • We always have >1 person annotating the same dataset, but we experience this error a.) when annotator 1 is annotating while annotator 2 is not, and when b.) both people are annotating at the same time. Both scenarios experience the error.

Here is our Dockerfile.local. Its purpose is to be able to run a docker container locally, without relying on any Gitlab or Kubernetes dependencies. To make it work (since you not have access to our base image), substitute FROM python:3.9-slim on the second line.

# Alternative Dockerfile that does not depend on AWS
FROM gcr.io/posh-containers/nsgi-base-image:latest

LABEL maintainer="Posh NLP Team <email.address>"

ENV LANG=C.UTF-8
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PRODIGY_WHEEL=prodigy-1.11.7-cp39-cp39-linux_x86_64.whl

WORKDIR /workspace
# This is where the raw data files (downloaded from s3) will be stored
RUN mkdir /workspace/data

# Install prodigy and its dependencies
COPY wheels/$PRODIGY_WHEEL /workspace
RUN pip install $PRODIGY_WHEEL psycopg2-binary hypercorn pyyaml \
    google-cloud-storage \
    && rm -rf /root/.cache/pip \
    && rm -f $PRODIGY_WHEEL

# Install a trained pipeline (spaCy language model) for NER recipes
RUN python -m spacy download en_core_web_md

COPY setup.py /workspace
COPY src/ /workspace/src

RUN pip install .

# The first directive exposes the API for "logging in" (so to speak) and
# "registering" as an annotator; the next directive exposes the ports for
# connecting to the annotator-specific web server running inside the container
EXPOSE 5000
EXPOSE 9091-9100

CMD ["hypercorn", \
     "--bind", "0.0.0.0:5000", \
     "--access-logfile", "-", \
     "--error-logfile", "-", \
     "annotator.api.app:app"]

Is it possible for you to share what you're doing in this file? I imagine you're doing some custom code there to deal with the different ports for different annotators.

I'm checking on what I am able to share. But yes, this is custom code to deal with different ports for different annotators.