Using Prodigy to train a new Computer Vision object detection model

Hi!

I’ve begun taking a look at prodigy’s computer vision abilities. I see that it works with Lightnet which has a set of ~80 objects it can detect. It seems like you can update prodigy to work with any of those existing objects/categories, but how would I train the model to recognize a new object entirely? For example, say I want to build an object detection model for a stapler. Would I try to adapt one of the existing categories to this new definition or would I try to add a new category entirely?

Sorry if this is outside the scope of prodigy! Thanks

Yes, this is definitely possible – you’ll just have to plug in your own model implementation. The LightNet codebase (our Python port of DarkNet) is still very experimental. We did use it internally to try an active learning workflow, and it looked promising, but it’s not production-ready and still fairly brittle, which is why there’s currently no built-in image.teach recipe, and only the image.test to try out the image interface on your own data.

Instead of LightNet, you probably want to use a different and more stable solution, like a PyTorch or Tensorflow implementation. Your model should provide two main methods: a predict function that takes a stream of examples, gets a prediction and and yields (score, example) tuples, and an update callback that takes a list of annotated examples and updates the model accordingly.

Prodigy also includes several helper functions that are useful for working with images – like the Images loader and the fetch_images pre-processor, which takes a stream of image paths and even URLs, and converts them to base64-encoded data URIs. This lets you store the image data with the annotation tasks. For more details and API docs see the PRODIGY_README.html. Your custom recipe could then look something like this:

import prodigy
from prodigy.components.loaders import Images
from prodigy.components.preprocess import fetch_images
from prodigy.components.sorters import prefer_uncertain

@prodigy.recipe('image.teach')
def image_teach(dataset, source):
    model = LOAD_YOUR_MODEL_HERE()
    stream = Images(source)  # path to a directory with images
    stream = fetch_images(stream)  # convert paths to base64-encoded data URIs
    
    return {
        'dataset': dataset,
        'stream': prefer_uncertain(model.predict(stream)),  # the sorted stream
        'update': model.update,  # method to update the model
        'view_id': 'image'
    }

There’s also a currently undocumented prodigy.util.b64_uri_to_bytes function that takes an encoded image and returns a bytes object. This should make it easy for your model to consume the image data. You can find more details on the image annotation task formats in the README. Essentially, your recipe/model should take the image and produce tasks of the following, simple format:

{
    "image": "data:image/png;base64,iVBORw0KGgoAAAANSUh...",
    "width": 800,
    "height": 600,
    "spans": [{
        "label": "PERSON",
        "points": [[150,80], [270,100], [250,200], [170,240], [100, 200]]
    }]
}

Each span describes an bounding box and "points" is a list of [x, y] coordinate tuples. Object detection models generally use rectangular bounding boxes – but as far as Prodigy is concerned, you can also easily render more complex polygon shapes.

Bootstrapping a new category is definitely the “hard part” – you’ll either need some already annotated data to pre-train your model, or find a model that already predicts something that’s at least similar to what you’re looking for, so you can improve it.

Prodigy should also come in very handy here, because it’ll let you skip through your examples quickly and visualise the model’s predictions. So you can load in your stream of stapler images, and see what your pre-trained model predicts. Maybe it turns out that it sometimes detects your staplers as something else – like BOWL. So using a recipe like the one I outlined above, you can try and teach your model a new concept of that category, by only accepting staplers and rejecting everything else. How well that works will depend on the pre-trained model and your input data. But Prodigy should definitely make it easier to experiment with this approach.

We’re also currently working on a manual image annotation interface that will let you create image annotations from scratch, by drawing the bounding boxes yourself. This is still under development, though, and we don’t yet have an ETA.

We also believe that ideally, manual annotation is something you should only have to resort to for evaluation data and edge cases. In Prodigy, we’re trying to reframe the problem and come up with more efficient solutions – for example, the patterns approach in ner.teach. Bootstrapping a new category with abstract examples like this is a little more difficult for images – we do have some ideas, but we haven’t really tested any of those in detail yet. (For now, our main focus has been to build out and finalise Prodigy’s NLP capabilities and solutions.)

If you do end up experimenting with plugging in a custom image model, definitely keep us updated on your progress – I’m very curious to hear how it works on different problems and use cases :grinning:

1 Like

The demo looks great, I’m looking forward to it’s release! That would definitely be helpful with bootstrapping the new category and creating pre-annotated data.

Thanks for the push in the right direction, I’ll definitely post here again if I get something working!

1 Like

Cool, thanks! :+1:

I just had another idea for bootstrapping the initial training examples: You could also try and find a pre-trained object detection model and run it with a fairly low threshold, i.e. make it find as many bounding boxes as possible. Chances are that it will also detect your staplers (if they’re isolated enough) and assign some random category to them.

You can then run your recipe without an update callback, create one annotation example for each bounding box and simply accept all boxes containing staplers (regardless of the category) and reject everything else. You might have to click through a lot of boxes, but because it’s only a binary decision and one click / key press, it should be pretty fast. (The decision is so visually obvious, you can easily get to < 1 second per click, once you’re in a good flow.)

When you’re done, you can export your dataset with --answer accept (to only extract the accepted boxes) and you’ll have a training set of staplers :tada:

prodigy db-out staplers_dataset /tmp --answer accept