Annotation for binary image segmentation in Prodigy?


Wondering if labelling for binary image segmentation is possible in Prodigy. The use case is simple. I have images with background pixels and foreground pixels. If I can create a rectangle around the foreground pixels and Prodigy can generate image masks or run-length encoded strings as output files.

Is this even possible and if yes can someone guide me through this task?

This will be much appreciated.


I'm currently not aware of Prodigy being able to turn the drawn shapes into image masks. You'd need a separate, custom, Python script for that. That said, I think the Python image library can be very helpful here.

import numpy as np
from PIL import Image 

img ="path/to/img.png").convert("RBG") 
arr = np.array(img)

This gives you a numpy array arr that represents the images' RGB values. If you have a simple shape, like a square, then you might be able to assign a mask via:

mask = np.zeros_like(arr)
mask[20:300, 40:200] = 1

I hope this helps!

Thanks @koaning. This is very helpful. Would it be possible to create a custom recipe that runs such a transformation seamlessly to create masks and store them in the db. And when you export labels in Prodigy it exports masks.

Hi Bilal,

there's certainly nothing stopping you from doing that. You could store the masked numpy representation back into the database as an image. But there can be a very good reason not to do that: disk space.

Have you seen our latest Prodigy short on how images are stored in Prodigy?

After watching the video, I hope you recognize that while it is possible to store the masked image, it may be much more lightweight to store the boxed coordinates instead.

Thanks, @koaning for sharing the video and tips. It is very useful. I'll not store images in the database as you suggested but will rather export the rectangles in JSON format and then use the custom Python code to generate masks.

Based on this discussion, I've started annotating the images. I've 25,000 images to annotate and they are pretty same. I'm drawing bounding boxes around the foreground which is predominantly the main part of the image and then accept the label and go to the next. Cool thing but very slow.

Forgive my naive question. It might be straight forward but I am new to Prodigy.

Is there a way to label one image and the bounding boxes are automatically created for all the subsequent images in the Prodigy UI? Most image labelling systems support this feature. It can save annotators a lot of time. People only adjust bounding boxes slightly on each image while reviewing them.

Do you know how that feature can be enabled in the Prodigy labelling UI?


Ah! That's a good question.


Before showing the solution, it helps to show you you would need to prepare the data upfront. So I'll label one example from a dataset I have locally.

python -m prodigy image.manual cat-demo images/ --label noa,sok --remove-base64

The dataset has pictures of my cats, and here you see an annotated example of both.

I hit "accept," and after that, I make sure that I also hit the save button.

I can now see this annotated example in json.

python -m prodigy db-out cat-demo > example_annot.jsonl 

The single example in this file looks like this:


Notice how the spans contain the coordinates? This is how Prodigy stores the annotation, but it's also what we need to supply Prodigy if we want it to draw the coordinates.

Passing these along

You can pass this file with annotations to Prodigy, you just have to make it aware that you're no longer giving it a folder of images. Notice in the command below that I'm passing --loader jsonl.

python -m prodigy image.manual cat-demo example_annot.jsonl --label noa,sok --remove-base64 --loader jsonl

And it gives an interface that has the annotations pre-filled.

Adding annotations yourself

You can use your own ML model to pre-fill the example_annot.jsonl file for Prodigy with spans that select the item you're interested in. Prodigy gives you full control in how you pre-fill these spans, but you will need to supply your own ML model.

Thanks, Vincent. I get the hang of the Prodigy the way it works from your explanation. I will try myself and if got stuck in something then will let you know.

1 Like

Hi @koaning

Is there away to modify the saved labels from previous labelling sessions or even the current one? By default, Prodigy allows you to change labels which are not saved to the database yet. What if I want to edit the ones that are posted to the database. It can be current session or an earlier session after which the server was stopped.

How would you recommend doing this?

Thanks and Warm Regards

I think it should be sufficient to unlink an example from the dataset and to add the improved example afterward. But I would be very careful here. If you find yourself fixing old annotations, then probably there's something about your annotation process that needs to be fixed instead.

You can also consider to have annotations go through a second review phase before you have your final dataset.