custom History text

I searched but did not identify a way to customize the text displayed in the rows of history; I have a use case where snippets are extracted from larger documents and numbered snippet are displayed for labeling within a NER task. It's helpful to know which offset the numbered snippet is from. Is it possible to display additional or custom information in the history pane for each instance of labeled data?

EDIT

Following this thread I used blocks to define a ner_manual that pulled text from a defined text.input field. However ner_manual still pulls from text and not text.input as expected, e.g.

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
from prodigy.util import split_string
import spacy
from typing import List, Optional


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
    "ner.form-ui",
    dataset=("The dataset to use", "positional", None, str),
    spacy_model=("The base model", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
label=("One or more comma-separated labels", "option", "l", split_string),
exclude=("Names of datasets to exclude", "option", "e", split_string),
)
def ner_form_ui(
    dataset: str,
    spacy_model: str,
    source: str,
    label: Optional[List[str]] = None,
    exclude: Optional[List[str]] = None,
):
    """
    Mark spans manually by token. Requires only a tokenizer and no entity
    recognizer, and doesn't do any active learning.
    """
    # Load the spaCy model for tokenization
    nlp = spacy.load(spacy_model)

    # Load the stream from a JSONL file and return a generator that yields a
    # dictionary for each example in the data.
    stream = JSONL(source)

    # Tokenize the incoming examples and add a "tokens" property to each
    # example. Also handles pre-defined selected spans. Tokenization allows
    # faster highlighting, because the selection can "snap" to token boundaries.
    stream = add_tokens(nlp, stream)

    blocks = [
        {"view_id": "ner_manual", "text":"input.text"}
        #{"view_id": "ner_manual"}
    ]

    return {
        "dataset": dataset,  # Name of dataset to save annotations
        "view_id": "blocks", # set the view_id to "blocks"
        "stream": stream,    # Incoming stream of examples
        "exclude": exclude,  # List of dataset names to exclude
        "config": {          # Additional config settings, mostly for app UI
            "lang": nlp.lang,
            "labels": label, # Selectable label options,
            "blocks": blocks # add the blocks to the config
        },
    }

With,

{"text": "BOOGER", "meta":{"form_name":"12588114345_wehf_SampleProjectSupportProposal.txt","page_number":0,"n":200,"jsonl_version":"0.01"},"input.text":"\n\nWALTER & ELISE  ...

but, prodigy ner.form-ui ner_forms en_core_web_lg ./data/processed/form_snippets.jsonl --label FIELD,DESCRIPTION,ANSWER -F ./src/models/form_ui.py

displays "BOOGER" instead of "\n\nWALTER & ELISE ..." as expected.

1 Like

Hi! The link you shared to the other thread doesn't work for me (I think because you copied it from Notion and that turned it into a forwarded link). If the answer was referring to input.text, that would refer to "input": {"text": "..."} – but this is only used as as fallback, in case no other properties are available. By default, the history will use "text", but if there is not text (e.g. because the example refers to a different data type), Prodigy will try and find some other representative property in the example.

I do think having a way to explicitly define the history text would be nice, and I'll add it to my list of enhancements. Prodigy would then use the value of that property (e.g. something like "history_text") if it's available and only use the fallback logic if it's not defined.

1 Like

Ah, thanks for catching the link. Definitely agree a property like that would be useful. What I did as a kludge is insert a line at the start of the text that represents what the history should show and hope for the best.

Just released Prodigy v1.11, which now allows tasks to provide a history_text field that's used first if available :tada:

1 Like

HI - i was just going through this thread and it's great that you added history as an option. Is there a code sample which I can use and see how to refer to it? thanks!

You can just include a history_text in the examples you stream in, for instance:

{"image": "https://example.com/image.jpg", "history_text": "Image description"}