Support for object detection models (Turi create)

image

(Espen Jütte) #1

A while ago Apple open sourced it’s mxnet based Turi Create-toolkit to build ML-models primarily for iPhone with CoreML. After playing around with the system, i wondered if this might be an interesting fit for the missing object detection/image models in prodigy. Turi does a lot of legwork for image based models; for example generating automatic image augmentations or allowing you to keep training on an already existing YOLO-model. The functions are deceptively simple and it works surprisingly well. Any plans to support something like this in the future or do you have your own plans for object detection/image-classification?


(Ines Montani) #2

Thanks, that sounds pretty exciting – will definitely look into this! :sparkles:

For our first object detection experiments, we actually used the DarkNet library (or rather, our own Python port of it: LightNet). It’s still experimental, but you can test the YOLO models via the image.test recipe (see here).

Prodigy already comes with a pretty versatile image interface that supports "spans" consisting of (x, y) pixel coordinates relative to the image, as well as labels. It can render both rectangular and polygon shapes. The format it expects is very simple – so if you want to give it a go and play around with integrating Turi Create via a custom recipe, this could be super cool. The only part that turned out to be a little tricky was the updating.

But if you only want to stream in the model’s predictions and collect feedback on them, all you need is extract the bounding boxes and image dimensions in a format like this:

{
    "image": "some_image.jpg",
    "width": 800,
    "height": 600,
    "spans": [{
        "label": "PERSON",
        "color": "magenta",
        "points": [[150,80], [270,100], [250,200], [170,240], [100, 200]]
    }]
}

If you wrap an image stream in the fetch_images preprocessor, Prodigy will load images from paths and URLs, and convert them to base64-encoded data URIs. This allows the image data to be passed around the application, and to be stored in the database together with the annotations. You can also easily decode the image data to bytes to process it by your model.

from prodigy.components.preprocess import fetch_images
stream = fetch_images(stream)
{"image": "some_image.jpg"}  # before
{"image": "..."} # after

I’ve also started experimenting with manual image annotation interfaces – similar to the new ner_manual. This will be a little more complex, though, so we don’t have an exact timeline for it yet.


(Espen Jütte) #3

Really cool, i was thinking of outputing Turi Create results to prodigy for evaluating. But i’m waiting for Turi to support Python 3 fully. But as you mention, the hard (and indeed cool) part would of course be to get model updating working.

Being able to evaluate, fine-tune and train domain-specific YOLO-models in prodigy would be really cool. Turi create takes out a lot of the work inn training the initial model, but there is still a good bunch of iterating back and forth between predictions and training data to figure out where the model is error-prone and adding more data for specific cases where the model is doing a poor job. I think prodigy could helpe immensely with that.