Blocks and Progress bar

Hello,

I have a couple of questions that I hope you might be able to help me with:

  1. How can I have a number/percentage in the progress bar instead of the infinite symbol? As below:
    image
    I tried setting "show_stats": true in the prodigy.json file and in the recipe config, but nothing changes.
  2. Is it possible to customize the output of the annotation? e.g. extracting the information in a different ways/outputting different information to the .jsonl file?
  3. Is it possible to create a blocks recipe, where one of the blocks only appears depending on the the answer to a previous block (e.g. 1st block is of type choice and 2nd block contains a html radio button only appearing depending on the answer to the choice block)?

Many thanks in advance!

Sofia

Hi! Prodigy will calculate the progress automatically if the stream returned by the recipe has a length (e.g. if it's a list). If the stream is a generator, it could potentially be infinite and it will only be read one batch at a time, so Prodigy can't know how many examples are left.

If you know how many examples you have, or want to implement some other custom logic to calculate the progress, you can also add a "progress" callback to the components returned by your recipe: Custom Recipes ยท Prodigy ยท An annotation tool for AI, Machine Learning & NLP

(Just keep in mind that the progress is calculated on the server, so it's updated every time answers are sent back and not fully in real time. Calculating it on the server means that you can easily write custom logic and even take other things like the model into account โ€“ for example, in the active learning recipes, the progress is an estimate of when the loss might hit zero and there's nothing left to learn).

The annotations stored in Prodigy's database will have Prodigy's JSON format, but you can always access it and export it in a custom format, rearrange the information etc. For example, you can connect to the database in Python and load your annotations. This gives you a list of dictionaries with the data, that you can then modify and export however you like:

from prodigy.components.db import connect

db = connect()
# This is a list of dictionaries that you can modify and export
examples = db.get_examples("your_dataset")

You can also attach custom metadata to the examples you stream in and it will be preserved and saved with the annotations. This lets you include things like custom internal IDs, document meta information, and so on.

You could achieve something like that by adding custom JavaScript and listening to the prodigyupdate event that gets fired every time an update is made to the current task, for example, if an option is selected or unselected. You can then show/hide the radio button or any other content based on the contents of the "accept" key (the list of selected choice options).

In general, we do recommend keeping the interfaces straightforward and avoiding too many conditional changes of the UI. If the annotator can see everything they need to do upfront, it can reduce the potential for errors, lets them move faster and it also makes it easier later on to reproduce exactly what an annotator saw at any given point. So sometimes it can be more efficient to make several passes over the data and ask for different pieces of information each time.

Hello @ines,

Many thanks for your detailed explanation. It was very helpful!

Cool, I only had to add stream = list(stream) before the return and that solved the problem.

By running this code the examples list was empty - I guess you meant to use function get_database() instead of get_examples(). By using the get_database() function I was able to access the output data, as you said. Many thanks!

I'm not sure I understand how I would to this (sorry, not used to working with JavaScript). Let us assume I have two blocks: a choice block and a html block defining a radio button ideally only appearing if answer to choice block is different from 0. Where do I define the custom JavaScript? I am defining the JavaScript for the radio button in the return. But for this I need to define the JavaScript before, right? To be able to return 2 blocks or only 1.

Many thanks again!

Sofia

Sorry, I meant get_dataset, yes! This was a typo.

Yeah, so you would define the JavaScript as the "javascript" key returned by your recipe's "config". It would then apply to every task. Under the hood, the HTML block with the radio button/checkbox would always be there โ€“ but it would be visually hidden unless something specific happens โ€“ for example, a certain option gets selected. So conceptually, the logic goes like this:

  • Trigger: the current example changes (e.g. because the annotator made a change).
    • Is choice option X selected?
      • Select the checkbox and mark it as visible / invisible.
  • Trigger: the checked status of the checkbox changes (e.g. because annotator ticked it).
    • Update the current example with that information.

Here's how this could look in code:

// This is called when Prodigy loads
document.addEventListener('prodigymount', event => {
    const checkbox = document.querySelector("#checkbox")
    // Hide the checkbox by default
    checkbox.style.display = "none"
    // If the checkbox is checked, update "custom_value" of the current task
    checkbox.addEventListener('change', event => {
        const checked = event.target.checked
        window.prodigy.update({ custom_value: event.target.checked })
    })
})

// This is called when a task is updated
document.addEventListener('prodigyupdate', event => {
    const { task } = event.detail
    const selected = task.accept || []  // the selected options
    const checkbox = document.querySelector("#checkbox")
    // Show the checkbox if LABEL_ONE is selected, hide it if it's not
    if (selected.includes('LABEL_ONE')) {
        checkbox.style.display = "block"
    }
    else {
        checkbox.style.display = "none"
    }
})

And here's what you could return by the recipe:

return {
    "dataset": dataset,
    "view_id": "blocks",
    "stream": stream,
    "config": {
        "javascript": JAVASCRIPT,
        "blocks": [
            {"view_id": "choice"},
            {"view_id": "html", "html": '<input id="checkbox" type="checkbox" />'},
        ],
    },
}
1 Like

Many many thanks for your help @ines!

@ines Hi...I added a list instead of stream in the recipe texcat.manual. Now I can see the progress bar but it always show 0%.
image

Here is my code:

from typing import List, Optional, Dict, Any, Union, Iterable
import spacy
from prodigy.components.loaders import get_stream
from prodigy.components.preprocess import add_label_options, add_labels_to_stream
from prodigy.core import recipe
from prodigy.util import combine_models, log, msg, get_labels, split_string

@recipe(
    "textcat.manual.custom",
    # fmt: off
    dataset=("Dataset to save annotations to", "positional", None, str),
    source=("Data to annotate (file path or '-' to read from standard input)", "positional", None, str),
    api=("DEPRECATED: API loader to use", "option", "a", str),
    loader=("Loader (guessed from file extension if not set)", "option", "lo", str),
    label=("Comma-separated label(s) to annotate or text file with one label per line", "option", "l", get_labels),
    exclusive=("Treat classes as mutually exclusive (if not set, an example can have multiple correct classes)", "flag", "E", bool),
    exclude=("Comma-separated list of dataset IDs whose annotations to exclude", "option", "e", split_string),
    # fmt: on
)
def manual(
    dataset: str,
    source: Union[str, Iterable[dict]] = "-",
    _=None,  # backwards-compat so we can show better error and plac doesn't fail
    api: Optional[str] = None,
    loader: Optional[str] = None,
    label: Optional[List[str]] = None,
    exclusive: bool = False,
    exclude: Optional[List[str]] = None,
) -> Dict[str, Any]:
    """
    Manually annotate categories that apply to a text. If more than one label
    is specified, categories are added as multiple choice options. If the
    --exclusive flag is set, categories become mutually exclusive, meaning that
    only one can be selected during annotation.
    """
    log("RECIPE: Starting recipe textcat.manual", locals())
    # Check to show proper error message: second arg used to be spacy_model that
    # wasn't actually used, so we perform a hacky check here
    try:
        spacy.load(source)
        msg.fail(
            "The textcat.manual arguments have changed in v1.9",
            "It looks like you're passing in a spaCy model as the second "
            "argument, which is not needed anymore (and wasn't used before). "
            "Try removing the argument and run textcat.manual again. You can "
            "also run this command with --help or see the docs for details.",
            exits=1,
        )
    except IOError:
        pass
    labels = label
    if not labels:
        msg.fail("textcat.manual requires at least one --label", exits=1)
    has_options = len(labels) > 1
    log(f"RECIPE: Annotating with {len(labels)} labels", labels)
    stream = get_stream(source, api, loader, rehash=True, dedup=True, input_key="text")
    if has_options:
        stream = add_label_options(stream, label)
    else:
        stream = add_labels_to_stream(stream, label)

    # to calculate the progress automatically
    stream = list(stream)

    return {
        "view_id": "choice" if has_options else "classification",
        "dataset": dataset,
        "stream": stream,
        "exclude": exclude,
        "config": {
            "labels": labels,
            "choice_style": "single" if exclusive else "multiple",
            "exclude_by": "input" if has_options else "task",
        },
    }

I added stream = list(stream) what else do I need to do make it work.

I am using version 1.9.9

Keep in mind that the progress is updated on the server (so you can perform any custom calculations) and sent back whenever you submit a batch of answers. So this is likely what's happening here: you've only annotated 2 examples, so no batch has been sent back yet. You should see an update once you've annotated two batches (with the default batch_size of 10). Alternatively, you can also set a lower batch size to see progress updates faster and submit answers earlier.

thanks. I tried annotating more than 10 examples but still didn't see any progress.
image

Also where I can set the batch_size in this recipe?

With a default batch size of 10, you'll need to annotate an initial 20 examples until you get the first response: if you annotate 10 examples, those will be kept in the history in the sidebar and not sent to the server yet, so you can easily undo a decision. When you annotate 10 more, a full batch of examples is available to send back. You can also look at the requests the app makes in your browser's developer console and look for a request to /give_ answers. That's the endpoint that will respond with the progress.

You can change the batch_size in your prodigy.json or in the "config" returned by the recipe.

@ines Thank you. For me it's working after annotation 25 examples. So I tried reducing the batch size in the following ways:
Config returned by recipe

return {
        "view_id": "choice" if has_options else "classification",
        "dataset": dataset,
        "stream": stream,
        "exclude": exclude,
        "config": {
            "labels": labels,
            "choice_style": "single" if exclusive else "multiple",
            "exclude_by": "input" if has_options else "task",
            "batch_size": 2 ,
        },
    }

and in the config while evoking the app

prodigy.serve("textcat.manual.custom",
                tagged_dataset_name,
                raw_jsonl_path,
                None,
                None,
                None,
                labels,
                True,
                None,
                port=port, show_stats = True,  batch_size = 2)
1 Like

@ines Sorry I forgot to mention that even after changing the configs it didn't work. It's still showing the progress after a batch of 25.

This sounds like the batch size setting might be overwritten somewhere else? Maybe double-check that you don't have a diffrent batch_size defined in your global prodigy.json or a local file in the working directory? Because this would overwrite the setting defined in the recipe.

Hi @ines ,

Following my initial question of making the checkbox invisible or visible, depending on the answer to a choice block: can that be done with a choice block and a ner_manual block? i.e. having the ner_manual block only appearing depending on the answer to the choice block.

Many thanks in advance,
Sofia

Hi @ines , wondering if you can still help me with the above question?
Many thanks in advance!

In theory, you could do this in a similar way and use an event listener on prodigyupdate to check whether task.accept includes the ID of the option that you want to track. You can then toggle the visibility of the ner_manual block depending on whether the option is selected. The targeting is potentially a bit hacky but you could use the :nth-child selector. So if your ner_manual block is the second block, you should be able to do:

const nerManualBlock = document.querySelector('.prodigy-content:nth-child(2)')
nerManualBlock.style.display = 'none'  // or 'block' to show
1 Like

Hi @ines ,

Thanks a lot. By doing this we make the text invisible but not the entities' labels.

Taking an example from your documentation:

The sentence to annotate "First look at the new MatcBook Pro" would be invisible but not the labels Person, Org and Product. What do I have to pass to the querySelector function to make this part also invisible?

Many thanks again!

Sofia

Glad it worked! :tada:

Ah, I forgot about that one: the selector for the header is .prodigy-title-wrapper!

If you want to select other elements, you can also open your browser's developer console and find the container. This will show you the class names:

Some elements will have persistent human-readable names starting with .prodigy-.... (The .c... class names are auto-generated, so those may change in future versions as the app changes.)

1 Like

Thank you so much @ines! Always very helpful.

1 Like

@ines I had a quick question. I was able to add the progress bar, but I still see the infinity symbol instead of a percentage complete. How/where can I turn this into a bar with a percentage?

In my recipe:

   return {
        "dataset": dataset,
        "view_id": "blocks",
        "stream": list(stream),
        "config": {
          "blocks": blocks,
          "choice_style": "multiple",
          "javascript": javascript
        }
    }

Screenshot:
Screen Shot 2022-09-16 at 11.32.41 AM

Hi @cheyanneb.

That's strange. I'd expect Prodigy to recognize that the list isn't infinite. Is it possible for you to share a minimum reproducible example? I just tried running a custom variant of textcat.manual locally and could confirm that turning the generator into a list gave me the progress bar I expected.

This is the recipe I used:

from typing import List, Optional
import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string


# Helper functions for adding user provided labels to annotation tasks.
def add_label_options_to_stream(stream, labels):
    options = [{"id": label, "text": label} for label in labels]
    for task in stream:
        task["options"] = options
        yield task

def add_labels_to_stream(stream, labels):
    for task in stream:
        task["label"] = label[0]
        yield task

# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
    "textcat.custom",
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string),
    exclusive=("Treat classes as mutually exclusive", "flag", "E", bool),
    exclude=("Names of datasets to exclude", "option", "e", split_string),
)

def textcat_manual(
    dataset: str,
    source: str,
    label: Optional[List[str]] = None,
    exclusive: bool = False,
    exclude: Optional[List[str]] = None,
):
    """
    Manually annotate categories that apply to a text. If more than one label
    is specified, categories are added as multiple choice options. If the
    --exclusive flag is set, categories become mutually exclusive, meaning that
    only one can be selected during annotation.
    """

    # Load the stream from a JSONL file and return a generator that yields a
    # dictionary for each example in the data.
    stream = JSONL(source)

    #Add labels to each task in stream
    has_options = len(label) > 1
    if has_options:
        stream = add_label_options_to_stream(stream, label)
    else:
        stream = add_labels_to_stream(stream, label)

    return {
        "view_id": "choice" if has_options else "classification",  # Annotation interface to use
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": list(stream),  # Incoming stream of examples
        "exclude": exclude,  # List of dataset names to exclude
        "config": {  # Additional config settings, mostly for app UI
            "choice_style": "single" if exclusive else "multiple", # Style of choice interface
            "exclude_by": "input" if has_options else "task", # Hash value used to filter out already seen examples
        },
    }

As an alternative in the meantime, Prodigy offers a progress callback that might offer a solution. This allows you to customize the progress percentage shown in the interface.

1 Like