I have a couple of questions that I hope you might be able to help me with:
How can I have a number/percentage in the progress bar instead of the infinite symbol? As below:
I tried setting "show_stats": true in the prodigy.json file and in the recipe config, but nothing changes.
Is it possible to customize the output of the annotation? e.g. extracting the information in a different ways/outputting different information to the .jsonl file?
Is it possible to create a blocks recipe, where one of the blocks only appears depending on the the answer to a previous block (e.g. 1st block is of type choice and 2nd block contains a html radio button only appearing depending on the answer to the choice block)?
Hi! Prodigy will calculate the progress automatically if the stream returned by the recipe has a length (e.g. if it's a list). If the stream is a generator, it could potentially be infinite and it will only be read one batch at a time, so Prodigy can't know how many examples are left.
(Just keep in mind that the progress is calculated on the server, so it's updated every time answers are sent back and not fully in real time. Calculating it on the server means that you can easily write custom logic and even take other things like the model into account โ for example, in the active learning recipes, the progress is an estimate of when the loss might hit zero and there's nothing left to learn).
The annotations stored in Prodigy's database will have Prodigy's JSON format, but you can always access it and export it in a custom format, rearrange the information etc. For example, you can connect to the database in Python and load your annotations. This gives you a list of dictionaries with the data, that you can then modify and export however you like:
from prodigy.components.db import connect
db = connect()
# This is a list of dictionaries that you can modify and export
examples = db.get_examples("your_dataset")
You can also attach custom metadata to the examples you stream in and it will be preserved and saved with the annotations. This lets you include things like custom internal IDs, document meta information, and so on.
You could achieve something like that by adding custom JavaScript and listening to the prodigyupdate event that gets fired every time an update is made to the current task, for example, if an option is selected or unselected. You can then show/hide the radio button or any other content based on the contents of the "accept" key (the list of selected choice options).
In general, we do recommend keeping the interfaces straightforward and avoiding too many conditional changes of the UI. If the annotator can see everything they need to do upfront, it can reduce the potential for errors, lets them move faster and it also makes it easier later on to reproduce exactly what an annotator saw at any given point. So sometimes it can be more efficient to make several passes over the data and ask for different pieces of information each time.
Many thanks for your detailed explanation. It was very helpful!
Cool, I only had to add stream = list(stream) before the return and that solved the problem.
By running this code the examples list was empty - I guess you meant to use function get_database() instead of get_examples(). By using the get_database() function I was able to access the output data, as you said. Many thanks!
I'm not sure I understand how I would to this (sorry, not used to working with JavaScript). Let us assume I have two blocks: a choice block and a html block defining a radio button ideally only appearing if answer to choice block is different from 0. Where do I define the custom JavaScript? I am defining the JavaScript for the radio button in the return. But for this I need to define the JavaScript before, right? To be able to return 2 blocks or only 1.
Yeah, so you would define the JavaScript as the "javascript" key returned by your recipe's "config". It would then apply to every task. Under the hood, the HTML block with the radio button/checkbox would always be there โ but it would be visually hidden unless something specific happens โ for example, a certain option gets selected. So conceptually, the logic goes like this:
Trigger: the current example changes (e.g. because the annotator made a change).
Is choice option X selected?
Select the checkbox and mark it as visible / invisible.
Trigger: the checked status of the checkbox changes (e.g. because annotator ticked it).
Update the current example with that information.
Here's how this could look in code:
// This is called when Prodigy loads
document.addEventListener('prodigymount', event => {
const checkbox = document.querySelector("#checkbox")
// Hide the checkbox by default
checkbox.style.display = "none"
// If the checkbox is checked, update "custom_value" of the current task
checkbox.addEventListener('change', event => {
const checked = event.target.checked
window.prodigy.update({ custom_value: event.target.checked })
})
})
// This is called when a task is updated
document.addEventListener('prodigyupdate', event => {
const { task } = event.detail
const selected = task.accept || [] // the selected options
const checkbox = document.querySelector("#checkbox")
// Show the checkbox if LABEL_ONE is selected, hide it if it's not
if (selected.includes('LABEL_ONE')) {
checkbox.style.display = "block"
}
else {
checkbox.style.display = "none"
}
})
@ines Hi...I added a list instead of stream in the recipe texcat.manual. Now I can see the progress bar but it always show 0%.
Here is my code:
from typing import List, Optional, Dict, Any, Union, Iterable
import spacy
from prodigy.components.loaders import get_stream
from prodigy.components.preprocess import add_label_options, add_labels_to_stream
from prodigy.core import recipe
from prodigy.util import combine_models, log, msg, get_labels, split_string
@recipe(
"textcat.manual.custom",
# fmt: off
dataset=("Dataset to save annotations to", "positional", None, str),
source=("Data to annotate (file path or '-' to read from standard input)", "positional", None, str),
api=("DEPRECATED: API loader to use", "option", "a", str),
loader=("Loader (guessed from file extension if not set)", "option", "lo", str),
label=("Comma-separated label(s) to annotate or text file with one label per line", "option", "l", get_labels),
exclusive=("Treat classes as mutually exclusive (if not set, an example can have multiple correct classes)", "flag", "E", bool),
exclude=("Comma-separated list of dataset IDs whose annotations to exclude", "option", "e", split_string),
# fmt: on
)
def manual(
dataset: str,
source: Union[str, Iterable[dict]] = "-",
_=None, # backwards-compat so we can show better error and plac doesn't fail
api: Optional[str] = None,
loader: Optional[str] = None,
label: Optional[List[str]] = None,
exclusive: bool = False,
exclude: Optional[List[str]] = None,
) -> Dict[str, Any]:
"""
Manually annotate categories that apply to a text. If more than one label
is specified, categories are added as multiple choice options. If the
--exclusive flag is set, categories become mutually exclusive, meaning that
only one can be selected during annotation.
"""
log("RECIPE: Starting recipe textcat.manual", locals())
# Check to show proper error message: second arg used to be spacy_model that
# wasn't actually used, so we perform a hacky check here
try:
spacy.load(source)
msg.fail(
"The textcat.manual arguments have changed in v1.9",
"It looks like you're passing in a spaCy model as the second "
"argument, which is not needed anymore (and wasn't used before). "
"Try removing the argument and run textcat.manual again. You can "
"also run this command with --help or see the docs for details.",
exits=1,
)
except IOError:
pass
labels = label
if not labels:
msg.fail("textcat.manual requires at least one --label", exits=1)
has_options = len(labels) > 1
log(f"RECIPE: Annotating with {len(labels)} labels", labels)
stream = get_stream(source, api, loader, rehash=True, dedup=True, input_key="text")
if has_options:
stream = add_label_options(stream, label)
else:
stream = add_labels_to_stream(stream, label)
# to calculate the progress automatically
stream = list(stream)
return {
"view_id": "choice" if has_options else "classification",
"dataset": dataset,
"stream": stream,
"exclude": exclude,
"config": {
"labels": labels,
"choice_style": "single" if exclusive else "multiple",
"exclude_by": "input" if has_options else "task",
},
}
I added stream = list(stream) what else do I need to do make it work.
Keep in mind that the progress is updated on the server (so you can perform any custom calculations) and sent back whenever you submit a batch of answers. So this is likely what's happening here: you've only annotated 2 examples, so no batch has been sent back yet. You should see an update once you've annotated two batches (with the default batch_size of 10). Alternatively, you can also set a lower batch size to see progress updates faster and submit answers earlier.
With a default batch size of 10, you'll need to annotate an initial 20 examples until you get the first response: if you annotate 10 examples, those will be kept in the history in the sidebar and not sent to the server yet, so you can easily undo a decision. When you annotate 10 more, a full batch of examples is available to send back. You can also look at the requests the app makes in your browser's developer console and look for a request to /give_ answers. That's the endpoint that will respond with the progress.
You can change the batch_size in your prodigy.json or in the "config" returned by the recipe.
This sounds like the batch size setting might be overwritten somewhere else? Maybe double-check that you don't have a diffrent batch_size defined in your global prodigy.json or a local file in the working directory? Because this would overwrite the setting defined in the recipe.
Following my initial question of making the checkbox invisible or visible, depending on the answer to a choice block: can that be done with a choice block and a ner_manual block? i.e. having the ner_manual block only appearing depending on the answer to the choice block.
In theory, you could do this in a similar way and use an event listener on prodigyupdate to check whether task.accept includes the ID of the option that you want to track. You can then toggle the visibility of the ner_manual block depending on whether the option is selected. The targeting is potentially a bit hacky but you could use the :nth-child selector. So if your ner_manual block is the second block, you should be able to do:
const nerManualBlock = document.querySelector('.prodigy-content:nth-child(2)')
nerManualBlock.style.display = 'none' // or 'block' to show
The sentence to annotate "First look at the new MatcBook Pro" would be invisible but not the labels Person, Org and Product. What do I have to pass to the querySelector function to make this part also invisible?
Some elements will have persistent human-readable names starting with .prodigy-.... (The .c... class names are auto-generated, so those may change in future versions as the app changes.)
@ines I had a quick question. I was able to add the progress bar, but I still see the infinity symbol instead of a percentage complete. How/where can I turn this into a bar with a percentage?
That's strange. I'd expect Prodigy to recognize that the list isn't infinite. Is it possible for you to share a minimum reproducible example? I just tried running a custom variant of textcat.manual locally and could confirm that turning the generator into a list gave me the progress bar I expected.
This is the recipe I used:
from typing import List, Optional
import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string
# Helper functions for adding user provided labels to annotation tasks.
def add_label_options_to_stream(stream, labels):
options = [{"id": label, "text": label} for label in labels]
for task in stream:
task["options"] = options
yield task
def add_labels_to_stream(stream, labels):
for task in stream:
task["label"] = label[0]
yield task
# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
"textcat.custom",
dataset=("The dataset to use", "positional", None, str),
source=("The source data as a JSONL file", "positional", None, str),
label=("One or more comma-separated labels", "option", "l", split_string),
exclusive=("Treat classes as mutually exclusive", "flag", "E", bool),
exclude=("Names of datasets to exclude", "option", "e", split_string),
)
def textcat_manual(
dataset: str,
source: str,
label: Optional[List[str]] = None,
exclusive: bool = False,
exclude: Optional[List[str]] = None,
):
"""
Manually annotate categories that apply to a text. If more than one label
is specified, categories are added as multiple choice options. If the
--exclusive flag is set, categories become mutually exclusive, meaning that
only one can be selected during annotation.
"""
# Load the stream from a JSONL file and return a generator that yields a
# dictionary for each example in the data.
stream = JSONL(source)
#Add labels to each task in stream
has_options = len(label) > 1
if has_options:
stream = add_label_options_to_stream(stream, label)
else:
stream = add_labels_to_stream(stream, label)
return {
"view_id": "choice" if has_options else "classification", # Annotation interface to use
"dataset": dataset, # Name of dataset to save annotations
"stream": list(stream), # Incoming stream of examples
"exclude": exclude, # List of dataset names to exclude
"config": { # Additional config settings, mostly for app UI
"choice_style": "single" if exclusive else "multiple", # Style of choice interface
"exclude_by": "input" if has_options else "task", # Hash value used to filter out already seen examples
},
}
As an alternative in the meantime, Prodigy offers a progress callback that might offer a solution. This allows you to customize the progress percentage shown in the interface.