Hi,
we have a database of text which gets updated in real-time, and I want to connect this database with Prodigy UI for annotation tasks where the number of annotation records should be updated in near real-time. how can we achieve this?
Hi Meka,
it depends a bit on what you mean with "near real-time" but I'll list some ideas that might help.
Option A: Cron
You could schedule a cronjob that downloads data from your database to a machine that's running Prodigy. You could have the cronjob restart the Prodigy server at regular intervals too and this would theoretically mean that you're always working with an up-to-date dataset. This approach is a bit hacky in the sense that the restart will cause Prodigy to go down momentarily, but it is something that's relatively quick to set up.
Option B: Custom Recipe
A neater approach might be to write a custom recipe. To copy the pseudocode listed on the docs:
import prodigy
@prodigy.recipe(
"my-custom-recipe",
dataset=("Dataset to save answers to", "positional", None, str),
view_id=("Annotation interface", "option", "v", str)
)
def my_custom_recipe(dataset, view_id="text"):
# Load your own streams from anywhere you want
stream = load_my_custom_stream()
def update(examples):
# This function is triggered when Prodigy receives annotations
print(f"Received {len(examples)} annotations!")
return {
"dataset": dataset,
"view_id": view_id,
"stream": stream,
"update": update
}
The stream
here is a sequence of dictionaries that contain items to be annotated. Typically these are read in from a file on disk, but nothing is stopping you from writing a Python generator that queries your database for new items. This approach does involve writing a custom recipe for your task, but it does feel like the most flexible approach.
Does this help?