Reviewing audio/video span annotations

Hi,
I set up a custom recipe for annotating spans in video files (using audio_manual). I tried using the review recipe to review the annotations and got a message saying: "reviewing 'audio_manual' annotations isn't supported yet".

So I tried to follow the instructions from this post (db-in). It opens the server and I see the names of my files at the bottom of the screen, but for some reason the video itself doesn't load (the video files are stored locally on my computer. In the annotation recipe I use loader=video-server, for the db-in, I tried 'video', 'video-server' and also 'jsonl').

What could be the issue? Is there maybe a better way for me to approach this?
I'd like to be able to view the manual annotations and correct them if necessary. I will eventually have two annotators, so I would need to be able to review their annotations and either choose the one to be saved into the 'gold-label' dataset, or modify and save it in the dataset.

If it helps - this is what I tried to run through the command line:

python -m prodigy OAR.video review_video_annotation_test dataset:video_annotation_test --label OAR_labels.txt --loader video-server -F OAR_video.py

(I also tried audio.manual instead of the custom recipe and without OAR_video.py)

and here's an example of the jsonl file with the annotations:

{"video": "/OAR_annotation/video_clips/02112016_las-vegas_NV/02112016_las-vegas_NV_1.mp4", "meta": {"file": "02112016_las-vegas_NV_1.mp4"}, "path": "OAR_annotation/video_clips/02112016_las-vegas_NV/02112016_las-vegas_NV_1.mp4", "_input_hash": -2137062564, "_task_hash": -1268654994, "_is_binary": false, "_view_id": "audio_manual", "audio_spans": [{"start": 0, "end": 217.0904034735, "label": "Other speaker", "id": "f306f945-05fd-4b6c-8de6-7b344d1530b1", "color": "rgba(153,50,204,0.2)"}], "answer": "accept", "_timestamp": 1735034758, "_annotator_id": "2024-12-24_12-05-02", "_session_id": "2024-12-24_12-05-02"}

I'd appreciate your help.
Thank you,
Gal

Hi @Gal_R,

When you store your video annotations without base64-encoded data URIs, the only reference to the original video file is the path stored under the path key. The video key will point to the temporary location where the video was being cached. If you look at your annotated dataset, the structure would be something like:

{
  "video": "/user_files/test_video.mp4",
  "text": "test_video",
  "meta": {
    "file": "test_video.mp4"
  },
  "path": "video/video_dir/test_video.mp4",
...

In order to be able to reload this dataset for revision using audio.manual, you'd need to make sure the video key points to right path. If the path under path is still valid, I suppose they easiest thing to do would be to have a simple wrapper function in your loader (or a separate script) that copies the value of path to video.
You should then be able to review the annotations using audio.manual. Make sure to use -FM (fetch media) flag to process local paths (this is done automatically by video-server loader when reading from a folder on disk, but now we are specifying paths in the input jsonl file).

As for reviewing video annotations coming from multiple annotators:

The reason why, as of now, region-like annotations for images and audio are not supported is that we haven't really found yet a satisfying way of representing the diff view for these annotations. Another tricky question is what should really count as a difference. When the user marks arbitrary regions manually, chances are, they will almost always differ due to pixel level differences. This can very soon get messy, especially if there are more than 2 annotators.
As a workaround, perhaps you might want to try a custom pages recipe? Each user's annotation could be a different page and then you could have a final choice page where you would make the selection (you would be able to tweak manually the selected preferred annotation). The input to pages UI can of course be built programmatically as a wrapper for the stream on the recipe level.
The recipe should probably also have a before-db call back that would store the final annotation based on the choice decision.
Ideally, you'd want to see all annotations on page, but, currently, the UI would render only one video key.

1 Like