Can you annotate Videos?

Is it possible to use prodi.gy for annotating videos. I want to replay like 10 hours of video. I need to be able to jump many frames back and forth and only label some frames.

Hi! You could probably build an interface like this using custom HTML templates and JavaScript, but Prodigy doesn’t have a built-in interface for playing and labelling videos. If you need to actually play videos, jump around frames etc., there are probably much more specialised tools that will do a much better job at this.

That said, in many cases, what you’re annotating in videos isn’t actually the live video, but rather every n-th frame as an image. That’s a workflow you can probably build quite easily with a custom recipe: write a Python script that loads the video and extracts every n-th frame as an image, yield out {"image": "..."} dicts for each task and add all other meta information (timestamp, file etc.) to the dict as well. You can then annotate it using the manual interface or any other annotation UI. When you export the annotations, the data will also include your meta information, so you’ll always be able to relate it back to the original file.

Thank you very much. This makes sense. Actually it would help to have 2 more buttons which let me jump ahead like 100 frames instead of 1 frame. Could you kindly point me into the right direction where I can find more infos on how to do it?

Thank you

Just to make sure I understand your question correctly: What exactly are you loading in? Videos or images? And what data do you expect to get out at the end?

The tasks you’re annotating in the UI should ideally be created in a previous process (preprocessing or at runtime in the recipe), so all you’d be collecting from the annotator is feedback / labels on the individual examples. If you (or the annotator) can just freely jump around the frames and create arbitrary annotated examples, this will make it very difficult to reprocude the decisions later on, and you’ll easily end up with inconsistent data.

If you want to annotate every 1000th frame, I’d recommend creating this data in the custom recipe or in a preprocessing step. I’m sure there are Python libraries (or other tools/services) that let you open a video file and extract frames at specific positions. You can then create the examples as one dict with image and metadata, and then stream that into Prodigy. In the annotation UI, all you then have to do is annotate.

Sorry if I am not clear. Let me try to explain what I aim to do. I am capturing the video feed from a horse barn. The feed is stored as individual jpg pictures. What I want to do is to use to detect certain posture of the horses. Therefore, I must go through all pics and find the one I am interested in and annotate them. Most of the time there is just a horse or no horse. My idea is to just have your web ui interface with an additional slicer where I can jump back and forth. Or alternatively some additional buttons to jump certain pics back and forth. Is this something which could be achieved?

Thanks! With a concrete example, it’s definitely more clear. (Are you actually annotating horses? If so, that’s a pretty interesting use case :smiley:)

Prodigy typically expects you to make a decision for each incoming example. Since all of your images would be examples, you could skip some of them manually by clicking the “ignore” button. But if you want to skip 1000 at a time, that’s not very satisfying. You could probably build a “skip 1000 examples” button with a custom HTML interface and some JavaScript that just calls window.prodigy.answer('ignore') 1000 times. I haven’t tried this yet, but something like that should be possible. You can find more details on using custom JavaScript in your PRODIGY_README.html.

However, from what you describe, there might be better ways to automate this. One idea that came to mind, which would be very much in line with Prodigy’s philosophy: If there’s no horse, it’s usually pretty obvious, right? So I think even with a simple out-of-the-box image classification implementation (scikit-learn etc.), you should be able to train a model to predict HORSE pretty well. You can then use that as a pre-process in your annotation recipe and only send out frames for annotation if they have a HORSE score of > 0.5 (or whichever threshold you want to use). This means that you’re not wasting your time skipping through frames that don’t even have horses in them and you get to focus in the actual task. And even if your initial HORSE classifier produces 10% false negatives, you’ll probably still end up with plenty of examples to label for horse posture.

Chaining models together like this is something we often recommend for both annotation workflows and runtime applications – I’m also showing a similar approach in my FAQ video here (for text classification, but the same can be applied to images as well).

Thank you for that and sorry for all the newbie questions. So what I try now is to replicate the example I've found in the forum . Where do I need to store the custome recipe and how can I call it ? Do I need to place it directly in the library or can I somehow reference it somewhere? I tried to replicate the example here is what I have.

#my_recipe.py
import prodigy
from prodigy.components.loaders import JSONL

with open('textcat_eval.html') as txt:
    template_text = txt.read()
with open('textcat_eval.js') as txt:
    script_text = txt.read()

@prodigy.recipe('sentiment',dataset=prodigy.recipe_args['dataset'],file_path=("Path to texts", "positional", None, str))
def sentiment(dataset, file_path):
    """Annotate the sentiment of texts using different mood options."""
    stream = JSONL(file_path)     # load in the JSONL file
    return {
        'dataset':dataset,
        'stream':stream,
        'view_id': 'html',
        'config': {
            'html_template': template_text,
            'html_script': script_text,
        }
    }
<!--textcat_eval.html-->
<h2>{{text}}</h2>

<input type="text" class="input" placeholder="User text here..." />
<button onClick="updateFromInput()">Update</button>
<br />
{{user_text}}

j

//textcat_eval.js
function updateFromInput() {
    const text = document.querySelector('.input').value;
    window.prodigy.update({ user_text: text });
}

Could you kindly tell me what to do from here to get this thing running. Where to place the code files and how to call them?

Thanks

No worries! You can find more details on this in the “Custom recipes” section of your PRODIGY_README.html. The -F argument on the command line takes the path to a Python file. If you add that, you can call your custom recipe function just like any other built-in recipe. The recipe arguments become arguments on the command line. For example:

prodigy sentiment your_dataset /path/to/data.jsonl -F /path/to/my_recipe.py

Btw, regarding the HORSE vs. no HORSE classification filter, here’s a simple example of how you could set this up:

Thank you very much. I have already a model which filters horse and no-horse pictures. What I actually want is to capture when a horse is pooing at the right spot and give it a treat if it does so. Therefore 98% of the time no pooing is happening :grinning: . The next problem is that when I try it to frame it as a simple classification problem it is hard for the AI to focus at the right region and I get a low precicion.

So now what I want to do is to save all the pictures by time and do a simple manual search for the interesting pictures and store them in a separate folder. I guess this is good enough for now.

But I have right now another problem where you may help me? When I run the image.manual recipe. The pictures are not yielded in alphabetical order. If you could tell me what the Image stream yields I could try to write my own. Also it seems that I cannot see the progress bar.

Here is what I found so far. I want to write I custom stream.

import prodigy
@prodig    y.recipe(
    "image.manual",
    dataset=prodigy.recipe_args["dataset"],
    source=prodigy.recipe_args["source"],
    api=prodigy.recipe_args["api"],
    loader=prodigy.recipe_args["loader"],
    label=prodigy.recipe_args["label_set"],
    exclude=prodigy.recipe_args["exclude"],
    darken=("Darken image to make boxes stand out more", "flag", "D", bool),
)
def image_manual(
    dataset,
    source=None,
    api=None,
    loader="images",
    label=None,
    exclude=None,
    darken=False,
):
    """
    Manually annotate images by drawing rectangular bounding boxes or polygon
    shapes on the image.
    """
    prodigy.log("RECIPE: Starting recipe image.manual", locals())
    stream = get_stream(source, api=api, loader=loader, input_key="image")
    stream = fetch_images(stream)

    return {
        "view_id": "image_manual",
        "dataset": dataset,
        "stream": stream,
        "exclude": exclude,
        "config": {"labels": label, "darken_image": 0.3 if darken else 0},
    }

Thank you

Oh cool – definitely keep us updated on how you go! I'm very curious to hear if this works :smiley::horse:

Ah, that's strange – the Image loader doesn't do anything magical, really. It just opens the directory in Python and iterates over the files. I thought pathlib's iterdir did this in alphabetical order, but maybe I'm wrong.

A stream in Prodigy is a Python generator that yields dictionaries, one per annotation task. For images, it'll need at least a key "image", pointing to a local path, URL or a base64-encoded data URI. For example:

{
    "image": "some_image.jpg"
}

You can also include any other custom properties – those will be passed through and stored with the annotations in the database.

If you check out the "Annotation task formats" section in your PRODIGY_README.html, you'll find more details and examples. If images are specified as paths (local or URL), the fetch_images preprocessor in the recipe will convert them to base64 strings. This lets you store the image data with the annotations in the database and works around the fact that modern browsers will block and not display images from local paths.

The progress bar will be shown if the stream exposes a __len__, or if the recipe returns a custom progress function. Because streams are generators, they can theoretically be infinite, so Prodigy can't always report a progress by default. But if you know how many images you have, you could just make your custom image loader a class with a __len__, and the progress should show up as expected.

Btw, note that the progress is reported by the server, so you'll only see updates after a batch is sent back to the server. The default batch size is 10, but you can customise this in the "config" returned by your recipe.

Prodigy is a great tool! However, for videos I would recommend you to check out UCI’s tool Vatic: http://www.cs.columbia.edu/~vondrick/vatic/

A nifty feature is that it can use linear approximation to draw boundaries inbetween a range of frames, so you don’t have to lable every frame. It also allows you to split the 10 hours into subsets.

1 Like

@seb This looks really good, thanks for sharing! That’s exactly the type of tool I meant when I mentioned the “more specialised tools” :+1:

Thank you for all the support @ines. I had to put a sorted around to get the files in the right order. Thanks for that.

for path in sorted(pathlib.Path(source).iterdir()):

I was also able to use the data and transorm it so I can use it to train a pytorch object detection model.
I have some other questions:

  • Is it possible to add the span attributes to an image.manual task. So that for every task a bounding box and a class is already shown and I only need to ammend the suggested boxes and labels? This would be such a great time saver. As I add more and more pictures the boxes and quality would get better and better and at some point maybe I only need to work with the hard to label pictures.
  • in my custom recipe I go through all files. Does prodigy recognize if an image has already been processed (after shutdown) for this dataset even if the stream returns all paths? Or do I need to take care of this?

Oh cool, glad to hear it's all working with PyTorch :slightly_smiling_face:

Yes, the manual image interface should respect pre-defined spans – so you can have your model assign the boxes and only correct the mistakes. The task with image "spans" should look like this:

{
    "image": "some_image.jpg",
    "width": 800,
    "height": 600,
    "spans": [
        {
            "label": "PERSON",
            "points": [[334, 14.5], [334, 88.6], [369, 88.6], [369, 14.5]]
        },
        {
            "label": "CAR",
            "points": [[47.5, 171.4], [47.5, 238.8], [156.6, 238.8], [156.6, 171.4]]
        }
    ]
}

For each image, you should provide the width and height, as well as a list of spans with the "label" and "points". The points are tuples of [x, y] pixel coordinates relative to the original image – so they should hopefully be pretty straightforward to generate from your model's output. Image spans can also take an optional "color" value, in case you want to define your own colour scheme.

Btw, at the moment, you do have to delete and re-add boxes if you need to correct them entirely. You can change the label by highlighting the box, but there's currently no resize functionality. But it hopefully shouldn't be a problem, because they're quite easy to draw and you can speed up the process by using keyboard shortcuts.

Yes, this is determined by comparing the "_task_hash" property. It's a hash generated based on the raw input (e.g. the image) and the annotations (e.g. the spans). This allows Prodigy to distinguish between questions on the same data – because in a lot of scenarios, you might want to answer several questions about the same image, but never the same question.

If no hashes are present, Prodigy adds them – but you can also provide your own for each example. In your case, you probably want the "_task_hash" and "_input_hash" to be identical – if you've already annotated an image once, you don't want to see it again, not even if it has different bounding box suggestions.