How to use different labels for individual audio files?

Hello everybody,

I'm about to use prodigy to annotate segments within an audio file. The files are containing news broadcasts. I want to annotate the start and end time of each news segment and assign it the ID of the speaker's manuscript, since I need to be sure that each manuscript is linked to the correct segment of the audio.

The manuscripts with text and ID of each news segment are in the metadata of each audio file. See below for data structure.

Today I dug into the workings of custom recipes, but unfortunately I could not find a way to set the labels from within my recipe. Are recipes even the right spot to start?

Also I want to display the title and text of all the manuscripts which appear in the audio file below the annotation view. I tried around with blocks, but the same problem: I could not find a way to access meta data through the recipe.

The data I pass in with each annotation task looks like this:

{
        "audio": "file.mp3",
        "text": "file",
        "meta": {"file": "file.mp3",
                      "manuscript": 
                             {"id1": "news text", 
                              "id2": "more news text from another item"}
                },
        "path": "file.mp3",
        "_input_hash": 1234,
        "_task_hash": 1235,
        "_is_binary": False,
        "_view_id": "blocks",
        "audio_spans": [
            {
                "start": 0.0, 
                "end": 10.0,
                "label": "id1",
                "id": "id1",
                "color": "rgba(255,215,0,0.2)",
            },
            {
                "start": 10.0,
                "end": 20.0,
                "label": "id2",
                "id": "id2",
                "color": "rgba(255,215,0,0.2)",
            }
        ],
        "answer": "accept",
        "_timestamp": 1698064248,
    },

Mainly I do have two questions:

  1. How can I display different labels (the manuscripts' IDs) for each audio file?
  2. Is it possible to display the text of each manuscript to support the correct assignment of the segments?

Thank you very much for your answers and support!

Best,
redadmiral

Display task specific information

After a bit of tinkering around and actually reading the Custom Interface documentation, I found an answer to my second question.

To display task specific information below the annotation view, I use an html block in my recipe.py...

    blocks = [
        {"view_id": "audio_manual"},
        {"view_id": "html"},
    ]

...and add the necessary information to my annotation data...

        "meta": {
            "file": "data/prodigy/audio/record_b1main_20231023T062900+0200.mp3",
            "manuscripts": [{"id": "omid3", "text": "test"}, {"id": "omid2", "text": "test1"}, {"id": "omid1", "text":  "test2"}],
        },

...to display it in a html template located in my prodigy.json file:

{
    "html_template": "{{#meta.manuscripts}}<p> {{ id }}: {{ text }}</p>{{/meta.manuscripts}}"
}

Use different labels for individual items

Unfortunatelly I think I've found out, that it's not possible to use different labels for each item in one annotation project. I've had a look at the network requests from the frontend. Since the labels are only transmitted once with the call to the project endpoint and are not included in the call to the get_session_question endpoint, I think the labels have to stay the same over the annotation session.

Please correct me, if I'm wrong - I kinda hope for it :crossed_fingers:

Hi there.

Happy to hear the Custom Interface documentation helped!

Just to make sure that we're talking about the same thing. Are you worried that you can't have unique items shown in the html or are you worried that you must have static labels? To be clear:

To answer:

  1. Yes you can have different items shown in the HTML.
  2. The labels indeed have to remain static for all the examples in a single recipe and must be defined upfront.

If you give more details about your project however I might be able to think along.

Hi Vincent,

thank you very much for your reply!

I need to annotate news radio broadcasts. Each broadcasts consists of 3-5 news events. Each event has an assigned ID from a database. I want to annotate the start and the end of each news event in the audio file and assign it the corresponding ID from the database.

What I tried to achieve is: The labels you have marked with the red arrow in the blue header should show for audio1.mp3 the IDs id1, id2, id3 and for the next file audio2.mp3 the IDs id4, id5, id6. This is just an example, the IDs come with the annotation data, as shown in the post above:

Since, if I understood you correctly, this is not possible, I'd be very happy if you have an idea how I could link the IDs to the static labels.

To make more clear what I mean: If I'd use the labels segment1, segment2, segment3 etc. for all the audio files – could I use a block to assign the segments to the id?

For audio1.mp3 the return would look something like this:

{ "segment1": "id1", "segment2": "id2", "segment3": "id3"}

and for audio2.mp3 it would look like this:

{ "segment1": "id4", "segment2": "id5", "segment3": "id6"}

Do you have an idea how to achieve this as user friendly as possible? I could not find an overview of all the possible block elements, I'm sorry if I've missed something in the documentation.

Best,
redadmiral

My initial thinking would be to take a two-step approach.

  1. Have an annotation interface that can distinguish between "segments".
  2. Have another interface that shows one of these segments at a time and allows you to add tags.

Might this work?