Image Classification Example?


Looking at the example recipes, and I see there is a documented recipe for creating annotations within images as bounding box annotations. Is there a similar recipe for regular image labeling? E.g. This is a photo of X, where you are classifying the entire image, rather than bounding boxes within the image?

It isn't clear from the documentation (at least at this point) how we would be able to build a custom recipe to use something like MobileNet on TF to learn from our annotations on Prodigy.

I primarily bought a license hoping prodigy would help me with an image classification problem, but the solution does not seem obvious at this point.

Any help would be appreciated.

Hi! The specifics of how you integrate the model of course comes down to the implementation – but ultimately, you probably want to use the classification interface and stream in examples that look like this:

{"image": "", "label": "SOME_LABEL"}

If you want to assign multiple labels at the same time, you could also use the choice interface and do something similar to this example – just with an image instead of a text: You can find more details on the data formats in the "Annotation task formats" section of your PRODIGY_README.html btw.

To label with a model in the loop, you essentially want two main components in your recipe: a function that uses the model to make predictions and yields outs scored examples, and an update callback that receives answers and updates the model. You might find this example recipe useful, that shows this using a dummy model that "predicts" random numbers:

In your case, that'd be an image model instead of a text classifier. One thing to keep in mind when choosing a model implementation is that you want your model to be sensitive enough to updates and small batches. After you submit one or two batches, you ideally already want to see a result and see different suggestions. That's not always the default configuration for computer vision models, as they typically expect to be updated with larger batches. (On the other hand, of course, you also don't want it to be too sensitive so that one small mistake immediately ruins your model.)


Your response is super-helpful.

It does raise the question however about resource requirements of the machine this is running on. Is their guidance for that?

Additionally - are you aware of any solid vision models that are that sensitive? Even using transfer learning approaches from smaller models like MobileNet, I’m not sure 2-3 training batches would get it done. Are you aware of any good examples elsewhere on the internet that shows a practical implementation of what you have described?

Regarding choice of vision models, you might find this comparison helpful: prodigy-recipes/ at master · explosion/prodigy-recipes · GitHub . It was done as part of this object detection example:

I do think you should be able to find an image classification model that works acceptably on CPU. The benchmarks in the repo refer to object detection, so I think classification should be a bit more efficient.

Regarding the batch sizing and number of examples to update: ultimately you'll need to try this out on your problem and see what's working well.

1 Like

Thank you @honnibal !

Responses and feedback have been stellar here. So cool to see a project that has an actually active community supporting it.