Distributed Annotators Using ImageServer Results in Errors

I am attempting to host a server that remote named annotators can access to supply annotated images back to me.

I am running Prodigy 1.9.9 and serving it to the external host interface by using PRODIGY_HOST=0.0.0.0. When I use the multi-annotator session URL parameters I can have multiple annotators supplying annotations when I use the Image loader class with feed_overlap=False. Everything runs smoothly. However, I would like to retain the full-size images for labeling, as such, the bloat to the DB through using the Image loader makes it unusable due to the base64 encoding.

The ImageServer loader is the ideal way to bypass that encoding. Unfortunately, when I use the ImageServer loader in my custom recipe, each annotator is only able to load the first image from their initial batch after which images fail to load and an error occurs.

Using basic and verbose logging shows no visible errors in the log.

The red-box error shown on the server states "Can't fetch tasks. Make sure the server is running properly."

Any insights you can shed on this would be greatly appreciated. Are there other ways I can uncover the root-cause of the error aside from basic and verbose loading? Is there something about my use-case that you would expect to fail?

Hi! Thanks for the detailed report. And damn, I wonder if this is related to async :sob: It would fit the pattern of async issues, because you have multiple users annotating in different processes, plus the static data, plus a potential error that doesn't surface correctly. Under the hood, the image server is implemented by just mounting an additional static directory at runtime, so maybe that's what's causing problems here. Does the image server work fine for you if you try it with only one session?

As a quick and simple workaround, you could just add your own image server with a custom loader and then serve the images from a different process on a different port, e.g. using a simple HTTP server. In your loader, you could then do something like this and a file image.jpg would then be sent out as http://localhost:1234/image.jpg:

from pathlib import Path
import json

dir_path = Path("/your/image/dir")  # your original images
base_url = "http://localhost:1234"  # wherever you're serving your image directory
file_ext = [".jpg", ".png"]

for file_path in dir_path.iterdir():
    if file_path.is_file() and file_path.suffix.lower() in file_ext:
        task = {
            "image": f"{base_url}/{file_path.parts[-1]}",
            "meta": {"file": str(file_path.parts[-1])},
            "path": str(file_path),
        }
        print(json.dumps(task))

Edit: Also, maybe I'm wrong and the true reason isn't related to the image server / async and maybe a swallowed traceback and an unrelated other error under the hood. Are you able to share a version of your recipe so we can try reproduce the problem?

Ines, thanks for the speedy reply.

I was able to get things working. For your own information here is some additional information / insights about what may have caused the issue, and my resulting fix.

Original Information:
I am able to use the default image.manual recipe and ImageServer locally for a single annotator (myself) without issue. When I attempt to host it with external host, and allow even a single user to sign-on - 'issue' presents itself (even with the image.manual recipe).

The Fix:
I made some edits to your proposed code (thanks for the template):

import prodigy
from pathlib import Path
import json

@prodigy.recipe("load-data")
def load_data():
    dir_path = Path("<Local_Path>")
    base_url = "<URL_PATH>"
    file_ext = [".jpg"]

    for file_path in dir_path.iterdir():
        if file_path.is_file() and file_path.suffix.lower() in file_ext:
            task = {
                "image": f"{base_url}/{file_path.parts[-1]}",
                "path": str(file_path),
            }
            yield task

Then I was able to access this loader from my custom recipe:

from custom_loader import load_data
stream = load_data()

Finally, I setup a SimpleHTTP server in Python3 -- this resulted in the same errors as before. I reasoned that the underlying issue was likely to do with its base inability to service more than one request at a time. As such, I implemented a more complex version of SimpleHTTP which made use of ThreadingMixIn, SocketServer, BaseHTTPSever, and other functionalities to support more than one active request and connection. I ran this server in Python 2 - though I am certain it could easily be ported to Python 3.

With the updated recipe, new loader function, and Custom Threaded HTTP server, I was able to service 6 simultaneous annotators (holding space-bar) and serve them images without issue. It appears this solves the problem I was facing. This is likely something you've already dealt with in Prodigy Teams, but thought I would share the details. Hope this helps - and thanks for pointing me in the right direction.

1 Like