Using Prodigy to train a new Computer Vision object detection model

Yes, this is definitely possible – you'll just have to plug in your own model implementation. The LightNet codebase (our Python port of DarkNet) is still very experimental. We did use it internally to try an active learning workflow, and it looked promising, but it's not production-ready and still fairly brittle, which is why there's currently no built-in image.teach recipe, and only the image.test to try out the image interface on your own data.

Instead of LightNet, you probably want to use a different and more stable solution, like a PyTorch or Tensorflow implementation. Your model should provide two main methods: a predict function that takes a stream of examples, gets a prediction and and yields (score, example) tuples, and an update callback that takes a list of annotated examples and updates the model accordingly.

Prodigy also includes several helper functions that are useful for working with images – like the Images loader and the fetch_images pre-processor, which takes a stream of image paths and even URLs, and converts them to base64-encoded data URIs. This lets you store the image data with the annotation tasks. For more details and API docs see the PRODIGY_README.html. Your custom recipe could then look something like this:

import prodigy
from prodigy.components.loaders import Images
from prodigy.components.preprocess import fetch_images
from prodigy.components.sorters import prefer_uncertain

@prodigy.recipe('image.teach')
def image_teach(dataset, source):
    model = LOAD_YOUR_MODEL_HERE()
    stream = Images(source)  # path to a directory with images
    stream = fetch_images(stream)  # convert paths to base64-encoded data URIs
    
    return {
        'dataset': dataset,
        'stream': prefer_uncertain(model.predict(stream)),  # the sorted stream
        'update': model.update,  # method to update the model
        'view_id': 'image'
    }

There's also a currently undocumented prodigy.util.b64_uri_to_bytes function that takes an encoded image and returns a bytes object. This should make it easy for your model to consume the image data. You can find more details on the image annotation task formats in the README. Essentially, your recipe/model should take the image and produce tasks of the following, simple format:

{
    "image": "...",
    "width": 800,
    "height": 600,
    "spans": [{
        "label": "PERSON",
        "points": [[150,80], [270,100], [250,200], [170,240], [100, 200]]
    }]
}

Each span describes an bounding box and "points" is a list of [x, y] coordinate tuples. Object detection models generally use rectangular bounding boxes – but as far as Prodigy is concerned, you can also easily render more complex polygon shapes.

Bootstrapping a new category is definitely the "hard part" – you'll either need some already annotated data to pre-train your model, or find a model that already predicts something that's at least similar to what you're looking for, so you can improve it.

Prodigy should also come in very handy here, because it'll let you skip through your examples quickly and visualise the model's predictions. So you can load in your stream of stapler images, and see what your pre-trained model predicts. Maybe it turns out that it sometimes detects your staplers as something else – like BOWL. So using a recipe like the one I outlined above, you can try and teach your model a new concept of that category, by only accepting staplers and rejecting everything else. How well that works will depend on the pre-trained model and your input data. But Prodigy should definitely make it easier to experiment with this approach.

We're also currently working on a manual image annotation interface that will let you create image annotations from scratch, by drawing the bounding boxes yourself. This is still under development, though, and we don't yet have an ETA.

We also believe that ideally, manual annotation is something you should only have to resort to for evaluation data and edge cases. In Prodigy, we're trying to reframe the problem and come up with more efficient solutions – for example, the patterns approach in ner.teach. Bootstrapping a new category with abstract examples like this is a little more difficult for images – we do have some ideas, but we haven't really tested any of those in detail yet. (For now, our main focus has been to build out and finalise Prodigy's NLP capabilities and solutions.)

If you do end up experimenting with plugging in a custom image model, definitely keep us updated on your progress – I'm very curious to hear how it works on different problems and use cases :grinning:

1 Like