how to update records for annotations in realtime

Hi,
we have a database of text which gets updated in real-time, and I want to connect this database with Prodigy UI for annotation tasks where the number of annotation records should be updated in near real-time. how can we achieve this?

1 Like

Hi Meka,

it depends a bit on what you mean with "near real-time" but I'll list some ideas that might help.

Option A: Cron

You could schedule a cronjob that downloads data from your database to a machine that's running Prodigy. You could have the cronjob restart the Prodigy server at regular intervals too and this would theoretically mean that you're always working with an up-to-date dataset. This approach is a bit hacky in the sense that the restart will cause Prodigy to go down momentarily, but it is something that's relatively quick to set up.

Option B: Custom Recipe

A neater approach might be to write a custom recipe. To copy the pseudocode listed on the docs:

import prodigy

@prodigy.recipe(
    "my-custom-recipe",
    dataset=("Dataset to save answers to", "positional", None, str),
    view_id=("Annotation interface", "option", "v", str)
)
def my_custom_recipe(dataset, view_id="text"):
    # Load your own streams from anywhere you want
    stream = load_my_custom_stream()

    def update(examples):
        # This function is triggered when Prodigy receives annotations
        print(f"Received {len(examples)} annotations!")

    return {
        "dataset": dataset,
        "view_id": view_id,
        "stream": stream,
        "update": update
    }

The stream here is a sequence of dictionaries that contain items to be annotated. Typically these are read in from a file on disk, but nothing is stopping you from writing a Python generator that queries your database for new items. This approach does involve writing a custom recipe for your task, but it does feel like the most flexible approach.

Does this help?

2 Likes