LLM for object detection in images

SiteAssist · December 6, 2023, 3:34pm

HI,

OpenAI API now accepts image urls in chat completions. GPT-4 is able to load the url and provide a description of what is in the image. It can further be prompted to look for certain objects and return the output in a specific format.

OpenAI will return a true or false flag indicating the presence of that object. These will then be used to pre-fill the image annotations.

Is it something possible to do with the current Pordigy LLM implementation?

Secondly, we would like to consider openAI as an annotator and validate its predictions in the review mode. In other words, we could generate prediction for all images from openAI and write them directly to the examples table under an openai-1234 session id.

Can we do this directly in prodigy?

ryanwesslen · December 7, 2023, 5:34pm

hi @SiteAssist,

Thanks for your question.

We don't currently have that function in our built-in recipes for handling images as it's such a new feature. I've added an internal ticket to look into it.

Probably your best option is to develop a custom recipe, perhaps extending the existing built-in recipes. If you weren't aware, you can view the existing recipes within your existing Prodigy install if you find where your package is installed (run prodigy stats, and view Location:). Then look for the recipes/llm folder.

We do have something very similar with our model-as-annotator recipes:

But that is for text tasks like ner, textcat, and spancat. The review interface doesn't work with images or audio as mentioned in the review docs:

In particular, the image_manual and audio_manual interfaces aren’t supported because the very nature of the UI makes it hard to combine annotations. These interfaces allow users to draw shapes and these may differ due to small differences in pixel values. That doesn’t allow for a great review experience which is why these aren’t supported.

You could likely still do something creative for a review like interface with the custom interfaces and combining it with blocks (e.g., add in a text_input to correct the text description). This would sort of be like the whisper plugin, except instead of audio interface, you'd use "html" (e.g., if you only want the image) and correcting GPT-4's image description.

Hope this helps!

Topic		Replies	Views
Image Classification - annotating labels usage , image , solved	10	2154	April 17, 2019
Image classification usage , image , custom	1	1418	November 9, 2017
image classification output for Yolo image	3	3341	May 11, 2018
How to use an image model in the prodigy to annotate the images usage , image , custom	1	313	April 3, 2023
How to use multiple choice annotation api in prodigy for images usage , image , solved	5	1275	June 10, 2019

LLM for object detection in images

Related topics