Exactly, but I didn't think the use of audo.manual
the way you described. Thanks.
Happy to hear it
Would it be possible to modify the UI to support annotations for hierarchical labels?
You can always consider making your own custom HTML template for this, but this would be a fair amount work. I does feel like you're asking for a component that's not natively supported via the blocks
mechanic.
Can we pass the file name for labels to the prodigy command? I have 27 labels for surgical tasks in a video for annotation. It will be much cleaner to load labels from files when there are many labels.
What I'm about to propose here is kind of a two-step approach. In the first step, you could select regions of interest. Via something like:
prodigy audio.manual issue-6341 videos --loader video --label REGION_OF_INTEREST
This will allow you to save many regions, like so:
Here's the thing that's nice. You can select regions that really need to have a label attached. And you can also choose to omit regions that do not require a label.
Given that we now have a dataset full of regions of interest, we can move on to an annotation interface where we can attach an appropriate label by using the text-input interface instead the default choice one. This interface can auto complete an input which allows you to more easily select from a large set of options.
Here's what this interface looks like:
The code
Here's the code you need for this custom recipe.
import prodigy
from typing import List
from prodigy.components.db import connect
from prodigy.components.preprocess import fetch_media as fetch_media_preprocessor
from prodigy.util import (
log,
msg,
get_labels,
split_string,
set_hashes,
get_config,
file_to_b64,
)
from prodigy.types import TaskType, RecipeSettingsType
def remove_base64(examples: List[TaskType]) -> List[TaskType]:
"""Remove base64-encoded string if "path" is preserved in example."""
for eg in examples:
if "audio" in eg and eg["audio"].startswith("data:") and "path" in eg:
eg["audio"] = eg["path"]
if "video" in eg and eg["video"].startswith("data:") and "path" in eg:
eg["video"] = eg["path"]
return examples
@prodigy.recipe(
"medical.custom",
# fmt: off
dataset=("Dataset to save annotations to", "positional", None, str),
source=("Dataset to annotate from", "positional", None, str),
# fmt: on
)
def custom(dataset: str, source: str) -> RecipeSettingsType:
db = connect()
stream = db.get_dataset_examples(source)
def split_stream_per_span(stream):
for item in stream:
for span in item["audio_spans"]:
item_copy = {k: v for k, v in item.items()}
item_copy["audio_spans"] = [span]
del item_copy["answer"]
del item_copy["_timestamp"]
del item_copy["_is_binary"]
item_copy["video"] = file_to_b64(item_copy["video"])
yield set_hashes(item_copy, overwrite=True)
stream = split_stream_per_span(stream)
log("RECIPE: Starting recipe medical.custom", locals())
blocks = [
{"view_id": "audio_manual"},
{"view_id": "text"},
{
"view_id": "text_input",
"field_rows": 1,
"field_label": "label",
"field_id": "user_label",
"field_autofocus": True,
"field_suggestions": [
"Stage Early - Situation Mild",
"Stage Middle - Situation Mild",
"Stage End - Situation Mild",
"Stage Early - Situation Severe",
"Stage Middle - Situation Severe",
"Stage End - Situation Severe",
"Other",
],
},
]
return {
"view_id": "blocks",
"dataset": dataset,
"stream": stream,
"before_db": remove_base64,
"config": {
"blocks": blocks,
"labels": ["REGION_OF_INTEREST"],
"audio_autoplay": False,
"auto_count_stream": True,
},
}
Note how the text input field uses field_suggestions
. You'd have to populate this list yourself.
Remember how before we selected regions of interested via:
prodigy audio.manual issue-6341 videos --loader video --label REGION_OF_INTEREST
This recipe can take the issue-6341
dataset and iterate over each span so that you can attach the right label to it.
python -m prodigy medical.custom issue-6341-annot issue-6341 -F recipe.py
When you annotate this, the annotations for each span will look like this:
{"video":"videos/CleanShot 2023-02-10 at 14.26.05.mp4","text":"CleanShot 2023-02-10 at 14.26.05","meta":{"file":"CleanShot 2023-02-10 at 14.26.05.mp4"},"path":"videos/CleanShot 2023-02-10 at 14.26.05.mp4","_input_hash":-313987848,"_task_hash":-1332879284,"_view_id":"blocks","audio_spans":[{"start":1.1108963076,"end":2.2666269953,"label":"REGION_OF_INTEREST","id":"a25e6bd1-1347-41b1-bc61-2d6995cc61c3","color":"rgba(255,215,0,0.2)"}],"user_label":"Stage Middle - Situation Mild","answer":"accept","_timestamp":1676631261}
{"video":"videos/CleanShot 2023-02-10 at 14.26.05.mp4","text":"CleanShot 2023-02-10 at 14.26.05","meta":{"file":"CleanShot 2023-02-10 at 14.26.05.mp4"},"path":"videos/CleanShot 2023-02-10 at 14.26.05.mp4","_input_hash":-313987848,"_task_hash":386145696,"_view_id":"blocks","audio_spans":[{"start":3.2131305757,"end":4.617941153,"label":"REGION_OF_INTEREST","id":"a163628e-6782-4ff8-acf7-6ae2f425a88f","color":"rgba(255,215,0,0.2)"}],"user_label":"Stage Early - Situation Severe","answer":"accept","_timestamp":1676631263}
{"video":"videos/CleanShot 2023-02-10 at 14.26.05.mp4","text":"CleanShot 2023-02-10 at 14.26.05","meta":{"file":"CleanShot 2023-02-10 at 14.26.05.mp4"},"path":"videos/CleanShot 2023-02-10 at 14.26.05.mp4","_input_hash":-313987848,"_task_hash":17097882,"_view_id":"blocks","audio_spans":[{"start":5.2555856703,"end":6.0227517303,"label":"REGION_OF_INTEREST","id":"fb958448-25d6-4f09-96b2-9693d80a9a19","color":"rgba(255,215,0,0.2)"}],"user_label":"Stage End - Situation Severe","answer":"accept","_timestamp":1676631265}
Notice how each example has a "user_label"? That's string that can also contain the hierarchical information. Note that each jsonline also contains the filename of the video too as well as the filepath.
From here you could use the same trick as before to get frames for each of these spans.
Quick Reflection
On reflection, I think the two step approach might be somewhat preferable to your original suggestion. By splitting the two tasks you end up with two relatively simple tasks that require little mouse-cursor movement. This may make it a lot quicker to annotate and it may also be less error prone.
I may be glancing over some important issues though, so feel free to correct me if I'm wrong.
Let me know!
Final Detail
While working on this I did realize that there currently is one feature missing from the audio interface to make this workflow smooth and that is that the audio cursor always starts at the beginning. For this workflow it'd be better if it would start when the selected span starts. It might also be nice if the interface could subset the audio string. I'll discuss this with the team if this might be a nice feature for the future.