Help with custom UI and multi-annotator setup in Prodigy

Hi support team,

I’m trying to design a custom Prodigy UI where annotators review and, if needed, correct machine-generated text suggestions for items in a list.

In my setup:

  • Each record contains a set of items (e.g. a shopping order with multiple products).
  • The UI should display the whole list in a table for context as there might be cross-referencing within items.
  • One row is highlighted as the current item under review.
  • Below the table, annotators see the original description and the suggested text, with the suggestion being editable.

Here’s a toy JSON example (type like parent-child UI form for relational datasets):
{
"order_id": "ORD123",
"customer_age": 34,
"customer_gender": "female",
"items": [
{
"seq": 1,
"code": "SKU001",
"product": "Organic Apples 1kg",
"original_description": "apples 1kg",
"suggested_text": "One pack of organic apples (1 kilogram)"
},
{
"seq": 2,
"code": "SKU002",
"product": "Wholegrain Bread 400g",
"original_description": "bread 400g",
" suggested_text": "One loaf of wholegrain bread (400 grams)"
} ]
}
My main questions:

  1. How can I create a custom UI that shows the full context but makes it easy to edit the current item’s suggestion?
  2. Can I run this workflow with a pool of annotators working on the same dataset, and is there a way to route/distribute tasks fairly across them?

We have ~220K rows to annotate, so the UI and workflow design will make a big difference.

Any advice, pointers, or examples would be hugely appreciated!

Thanks in advance
Bilal

Hi Bilal,

Prodigy should support all the pieces you need for your workflow. You can define custom annotation interfaces that either assemble existing Prodigy components as “blocks”, or display your own HTML and Javascript. You can even add event hooks on the server if you need interactive features that require some server-side computation.

You can find the docs on the custom interfaces here: Custom Interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP . At first glance it sounds like you’ll want a block displaying the table and a block with a text input? I’m not sure what the best configuration for your requirements is though, you might need to try out a few different ways to get a good balance between displaying the information annotators need and reducing interactions with the interface for each annotation. If you need extra information you can also use the pages feature. However, often it’s better to find a way to need less and keep the annotation tasks simpler, so the annotators can move faster.

Often in annotation tasks about 95% of the examples could be decided with a very simple information display, and only a few percent need a lot of detail. If this is the case for you, you could consider keeping the interface really focussed and efficient, and simply have the annotators flag the extra examples and come back to them later.

Prodigy also supports routing tasks to multiple annotators in a variety of ways. You can implement your own task routing functions or use our prebuilt options. You can find the docs on this here: Task Routing · Prodigy · An annotation tool for AI, Machine Learning & NLP

Thanks for the pointers in your earlier reply! I’ve tried to implement a pages + blocks workflow using a toy example. I also used ChatGPT to help me draft the recipe.

Data (dataset/catalog.jsonl):

{"catalog_id":"CAT001","curator":"ops-team","catalog_items":[{"seq":1,"sku":"A-100","title":"Acme SuperWidget 3000","original_text":"super widget v3k","suggested_text":"Acme SuperWidget 3000"},{"seq":2,"sku":"B-200","title":"BoltMaster Kit (42pc)","original_text":"bolts kit 42 pieces","suggested_text":"BoltMaster Kit, 42-Piece"}]}

Recipe (recipe.py):

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import set_hashes
from typing import Iterable, Dict, List


def _html_catalog_table(items: List[Dict], highlight_seq: str = None) -> str:
    """Context table with optional highlight for the current row."""
    rows = []
    for it in items:
        seq   = it.get("seq", "")
        sku   = it.get("sku", "")
        title = it.get("title", "")
        orig  = it.get("original_text", "")
        sug   = it.get("suggested_text", "")

        highlight = (str(seq) == str(highlight_seq)) if highlight_seq is not None else False
        row_style = "background-color:#fff7cc;" if highlight else ""  # pale yellow

        rows.append(
            f"<tr style='{row_style}'>"
            f"<td style='padding:6px 8px; border-bottom:1px solid #eee'>{seq}</td>"
            f"<td style='padding:6px 8px; border-bottom:1px solid #eee'>{sku}</td>"
            f"<td style='padding:6px 8px; border-bottom:1px solid #eee'>{title}</td>"
            f"<td style='padding:6px 8px; border-bottom:1px solid #eee'>{orig}</td>"
            f"<td style='padding:6px 8px; border-bottom:1px solid #eee'>{sug}</td>"
            f"</tr>"
        )

    header = (
        "<table style='width:100%; border-collapse:collapse; font-size:14px'>"
        "<thead>"
        "<tr style='text-align:left'>"
        "<th style='padding:6px 8px; border-bottom:2px solid #ccc; width:60px'>Seq</th>"
        "<th style='padding:6px 8px; border-bottom:2px solid #ccc; width:120px'>SKU</th>"
        "<th style='padding:6px 8px; border-bottom:2px solid #ccc'>Title</th>"
        "<th style='padding:6px 8px; border-bottom:2px solid #ccc'>Original</th>"
        "<th style='padding:6px 8px; border-bottom:2px solid #ccc'>Suggested</th>"
        "</tr>"
        "</thead><tbody>"
    )
    return header + "".join(rows) + "</tbody></table>"


def _iter_tasks(source: str) -> Iterable[Dict]:
    """
    One parent task per catalog (pages). Each page = one item:
    - Show full table as context with current row highlighted
    - Editable text field with the current suggestion pre-filled
    """
    for cat in JSONL(source):
        items = cat.get("catalog_items") or []
        if not isinstance(items, list) or not items:
            continue

        catalog_id = cat.get("catalog_id", "")
        curator    = cat.get("curator", "")

        pages: List[Dict] = []
        for it in items:
            seq   = it.get("seq", "")
            sku   = it.get("sku", "")
            title = it.get("title", "")
            orig  = it.get("original_text", "") or ""
            sug   = it.get("suggested_text", "") or ""

            context_table_html = _html_catalog_table(items, highlight_seq=seq)

            page_html = (
                "<div style='text-align:left; width:100%'>"
                f"<div style='margin:0 0 8px 0; color:#666'>"
                f"<strong>Catalog:</strong> {catalog_id}"
                f" &nbsp;&nbsp; | &nbsp;&nbsp; "
                f"<strong>Curator:</strong> {curator}"
                f"</div>"
                f"{context_table_html}"
                "<hr style='margin:12px 0; border:none; border-top:1px solid #ddd'/>"
                f"<div style='padding:6px 0; color:#666'><strong>Item Number:</strong> {seq}</div>"
                f"<div style='padding:6px 0'><strong>SKU:</strong> {sku}</div>"
                f"<div style='padding:6px 0'><strong>Title:</strong> {title}</div>"
                f"<div style='padding:6px 0'><strong>Original Text:</strong> {orig}</div>"
                "</div>"
            )

            page = {
                "view_id": "blocks",
                "config": {
                    "blocks": [
                        {"view_id": "html", "html": page_html},
                        {
                            "view_id": "text_input",
                            "field_id": "edited_text",
                            "field_label": "Normalized Text",
                            "placeholder": "Enter the normalized/catalog-ready text…",
                            "rows": 2
                        },
                    ]
                },
                "edited_text": sug,                       # prefill
                "text": sug or f"{catalog_id} item {seq}",# ensure hashable
                # store meta-like data under a non-special key to avoid footer chips
                "task_info": {
                    "catalog_id": catalog_id,
                    "seq": seq,
                    "sku": sku,
                    "title": title,
                    "original_text": orig,
                    "suggested_text": sug,
                },
            }

            page = set_hashes(page, input_keys=("text",))
            pages.append(page)

        if not pages:
            continue

        parent = {
            "view_id": "pages",
            "pages": pages,
            "text": f"Catalog {catalog_id}",
            "task_info": {
                "catalog_id": catalog_id,
                "curator": curator,
                "num_items": len(pages),
            },
        }
        parent = set_hashes(parent, input_keys=("text",))
        yield parent


@prodigy.recipe(
    "catalog.normalize_review",
    dataset=("Dataset to save annotations", "positional", None, str),
    source=("Path to JSONL catalog file", "positional", None, str),
)
def catalog_normalize_review(dataset: str, source: str):
    """
    Multi-annotator, no-overlap review/edit workflow:
      - One catalog per task (pages), one page per item.
      - Each catalog routed to a single session (no overlap).
      - Record session user as `reviewer`.
    """
    stream = _iter_tasks(source)

    def before_db(examples):
        for eg in examples:
            session = (
                eg.get("session_id")
                or eg.get("_session_id")
                or eg.get("session")
                or ""
            )
            for p in eg.get("pages") or []:
                orig = (p.get("task_info", {}) or {}).get("suggested_text", "")
                edited_val = p.get("edited_text", "")
                final = edited_val if isinstance(edited_val, str) and edited_val.strip() != "" else orig
                p["final_text"] = final
                p["edited"] = (final != orig)
                p["reviewer"] = session
        return examples

    return {
        "dataset": dataset,
        "stream": stream,
        "view_id": "pages",
        "before_db": before_db,
        "config": {
            "lang": "en",
            "auto_count_stream": True,
            "buttons": ["accept", "reject", "ignore"],
            # route whole catalogs to a single annotator
            "feed_overlap": False,
            "exclude_by": "input",
            # hide meta chips just in case
            "show_meta": False,
            # (optional) comment out the keymap section; left out to keep MRE minimal
            # If needed, demonstrate focus bug persists even without keybindings.
        },
    }

I’m getting the desired UI (screenshot below). The Normalized Text field is meant for the reviewer to edit whenever they feel the suggested description could be improved:

Problem: In the text_input block, the user loses focus after one keystroke. The annotator must click back into the field to continue typing.

Questions:

  1. Is there a workaround to keep focus so annotators can type normally?
  2. Also, could you please let me know if I’m doing something in a way that’s not the Prodigy way, and if there are improvements you’d recommend for this workflow design?

Thanks again,

Bilal

Hi @nlp-guy,

The issue with the text_input losing auto-focus is actually a bug in Prodigy. We're working on the patch - I'll ping when it's out (it should be this week).

As for your implementation there's nothing that stands out to me as incorrect. Except maybe for nitpicks such as attributes names for input field configuration (it's field_placeholder not placeholder and field_rows not rows). You should be checking "field_autofocus" to True so the full blocks definition should be:

"blocks": [
     {"view_id": "html", "html": page_html},
     {
          "view_id": "text_input",
          "field_id": "edited_text",
          "field_label": "Normalized Text",
          "field_placeholder": "Enter the normalized/catalog-ready text…",
          "field_rows": 2,
          "field_autofocus": True
      }]