Image.manual ERROR: Couldn't save annotations. Make sure the server is running correctly

Hello!

I have encountered the issue mentioned above and could not find a solution. I have searched everywhere in this forum and on google but found nothing that helped me.

I tried running with both PRODIGY_LOGGING=basic and with PRODIGY_LOGGING=verbose. No logs show any errors. Tried swapping sqlite with postgresql, same error.

Specs:

  • prodigy==1.12.7
  • Ubuntu 23.10
  • Python 3.11.6

Custom recipe, ran with:

    script:
      - >-
        prodigy image.assisted images7
        datasets/images7
        datasets/images7.pred.beta.mini.json
        --label TITLE,AUTHORS,AUTHOR_AFFILIATIONS,ABSTRACT,THANKS,BODY,FIGURE,FIGURE.CAPTION,TABLE,TABLE.CAPTION,EQUATION,PAGE_NO,BIBLIOGRAPHY,FOOTNOTES,META,MISC,HEADER,FOOTER,COPYRIGHT
        -F scripts/recipes/assisted.py

recipe in question:

import copy
import json
import typing as t
from typing import List, Optional

import prodigy
from prodigy.components.loaders import Images
from prodigy.types import StreamType
from prodigy.util import get_labels, set_hashes, split_string

from scripts.constants import CLASS_NAMES
from scripts.recipes.loaders import _to_poly


class PredictedSpan(t.TypedDict):
    label: str
    x: float
    y: float
    width: float
    height: float
    center: t.Tuple[float, float]
    points: t.List[t.Tuple[float, float]]


def make_spans(
    boxes: t.List[t.Tuple[float, float, float, float]],
    labels: List[int],
    scores: List[float],
) -> List[PredictedSpan]:
    """Create the spans for the predicted bounding boxes"""
    spans = []
    for box, label_idx, score in zip(boxes, labels, scores):
        x0, y0, x1, y1 = box
        center = ((x0 + x1) / 2, (y0 + y1) / 2)
        span = {
            "label": CLASS_NAMES[label_idx - 1],
            "x": x0,
            "y": y0,
            "width": x1 - x0,
            "height": y1 - y0,
            "center": center,
            "points": _to_poly(box),
        }
        spans.append(span)
    return spans


def make_labels(bbox_path: str, stream: StreamType, threshold: float) -> StreamType:
    """Add the predicted labels in the 'labels' key of the image spans"""
    examples = list(stream)
    with open(bbox_path, "r") as f:
        predictions = json.load(f)

    for eg in examples:
        task = copy.deepcopy(eg)
        filename = task["path"]
        if prediction := predictions.get(filename):
            boxes, labels, scores = (
                prediction["boxes"],
                prediction["labels"],
                prediction["scores"],
            )
            # Filter the predictions based on the threshold
            boxes = [box for i, box in enumerate(boxes) if scores[i] > threshold]
            labels = [label for i, label in enumerate(labels) if scores[i] > threshold]
            scores = [score for score in scores if score > threshold]
            spans = make_spans(boxes, labels, scores)
            task["spans"] = spans
        task = set_hashes(task)
        yield task


@prodigy.recipe(
    "image.assisted",
    # fmt: off
    dataset=("Dataset to save annotations to", "positional", None, str),
    source=(
        "Data to assisted/annotate (directory of images, file path or '-' to read from standard input)", "positional",
        None,
        str),  # noqa: E501
    # noqa: E501
    bbox_path=(
        "Path to the bounding box annotations file (this model doesn't have OCR installed)", "positional", None, str),
    # noqa: E501
    label=("Comma-separated label(s) to annotate or text file with one label per line", "option", "l", get_labels),
    exclude=("Comma-separated list of dataset IDs whose annotations to exclude", "option", "e", split_string),
    threshold=("Threshold to filter the predictions (0 - 1)", "option", "t", float),
    darken=("Darken image to make boxes stand out more", "flag", "D", bool),
    # fmt: on
)
def assisted(
    dataset: str,
    source: str,
    bbox_path: str,
    label: Optional[List[str]] = None,
    exclude: Optional[List[str]] = None,
    threshold: int = 0.7,
    darken: bool = False,
):
    """
    Annotate documents with the help of a layout model.
    """
    # Much of the proceeding blocks of code is based from image.manual
    # Source: https://github.com/explosion/prodigy-recipes/blob/master/image/image_manual.py
    stream = Images(source)
    # Update the stream to add bounding boxes (based from annotations) and labels (based from the
    # finetuned model).
    # stream = make_bboxes(bbox_path, stream)
    stream = make_labels(bbox_path, stream, threshold)

    def before_db(examples):
        for eg in examples:
            if eg["image"].startswith("data:") and "path" in eg:
                eg["image"] = eg["path"]
        return examples

    return {
        "view_id": "image_manual",  # Annotation interface to use
        "before_db": before_db,  # Function to call before the examples are added to the database
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": stream,  # Incoming stream of examples
        "exclude": exclude,  # List of dataset names to exclude
        "config": {  # Additional config settings, mostly for app UI
            "label": ", ".join(label) if label is not None else "all",
            "labels": label,  # Selectable label options,
            "darken_image": 0.3 if darken else 0,
        },
    }

Example of data that is causing the error can be found here

Any help would be greatly appreciated!

Hi @biagiodistefano,

If you ctrl-c out of this error state does any traceback show up? I see the Prodigy invocation comes from a yaml file. If you don't get the Python traceback upon ctrl-c, could you try running the recipe locally or directly in your terminal and ctrl-c then to see if we can get some Python error messages.
Thanks!

Hi @magdaaniol ,

Thanks for your reply. Unfortunately not, when I Ctrl-c out of the server, I don't get any traceback.

And here's the catch: locally it does not raise any errors! I cannot reproduce it locally.

A workaround I used is to just serve this to the annotators via ngrok, but it is a very short term solution, especially given we work on different timezones and I would need to leave my laptop constantly on.

Not really sure where to go from here

And here's the catch: locally it does not raise any errors! I cannot reproduce it locally.

A tricky one then!
Could you say some more on the deployment details with the failure? For example are you using a proxy server? I'm thinking perhaps there's a limit on the body size for the request on the proxy? It looks like you're potentially saving base64-encoded data URIs for some of the examples...
Does it fail on the first intent of saving the examples in the DB or do you actually manage to save some batches?

It fails immediately at the first batch.

It is deployed on a DO droplet behind nginx; I had this same configuration in the past and never encountered this error (although it was for text based annotations).

I’m not saving base64 encoded data, I swap that out with the image path in the before_db method (see recipe).

I have successfully deployed it on a different droplet and also managed to save some batches. Same OS, same Python environment, different specs. However the process gets killed after a while.

My main suspicion is that it is a resource problem. I will try to deploy on a server optimised for RAM and see what happens.

While I’m here, just out of curiosity: does prodigy try to load all the images in memory or does it load them lazily? I’m asking because I had folders with thousands of images that caused the process to get killed immediately, and only splitting them into folders with fewer images solved the problem.

I’m not sure if the two issues are related

@magdaaniol Update: I tried doing everything while monitoring RAM an CPU and it doesn't seem to be a resource problem.

Hi @biagiodistefano,

Thanks for the update!

I’m not saving base64 encoded data, I swap that out with the image path in the before_db method (see recipe).

There is this extra and condition for the existence of path which is why I originally said that you might "potentially" be saving the encoded data. But if you are sure it's not being saved then that's out of the equation.

I have successfully deployed it on a different droplet and also managed to save some batches. Same OS, same Python environment, different specs. However the process gets killed after a while.

What was the difference in specs?

While I’m here, just out of curiosity: does prodigy try to load all the images in memory or does it load them lazily? I’m asking because I had folders with thousands of images that caused the process to get killed immediately, and only splitting them into folders with fewer images solved the problem.

Prodigy works by buffering batches of data so with a single annotator only a three batches should ever be loaded in memory (the current batch being annotated, the future batch and the already annotated batch that's being buffered for possible edits before it's saved to the DB). This might change if there are multiple annotators with feed overlap set to false as more data will have to pulled. But it is definitely not loading the entire dataset, unless you batch size is massive, of course. Not sure what was the reason to fail there, but (now legacy) Image loader streams the files from the directory on pull basis so it should never try to load the entire folder.

I was wondering if there's anything on the frond end console perhaps? It looks like this user here had a similar issue.

@magdaaniol you are a genius!! It was an Nginx problem, that's why I had no logs, Nginx was not even passing the request to prodigy.

I solved it by adding client_max_body_size 300m; to my nginx config.

It makes sense given it's images, the body might be large.

Thank you a lot!

1 Like