What is Prodigy's behaviour in annotating two identical images?

zagreus · September 6, 2021, 9:13am

Hypothetically, in a computer vision job, there can be two identical images (i.e. identical base-64 encoded image strings) in the input jsonl file. In this scenario, does Prodigy repeat the same images twice or does it only provide the task once?

Apology if the answer is available in an FAQ page somewhere, I attempted a 15 minutes browse through past topics and documentation and was not able to find the answer myself.

Thank you!

ines · September 7, 2021, 2:37am

Hi! By default, those two images (assuming they're actually identical) would receive the same _input_hash values so Prodigy would consider them identical. If you're using a workflow like image.manual or another recipe that excludes based on input, you would only see this image once and the second image would be skipped. You can also assign your own hashes if you want to, e.g. if you want to treat two images as identical, even though their bytes are slightly different.

There are of course workflows where you want to exclude based on the task hash instead (a combination of the input + annotations). For example, if you're classifying images with binary labels and want to ask multiple questions about the same image. This section has some more details on how hashing and deduplication works: https://prodi.gy/docs/api-loaders#hashing

Topic		Replies	Views
Duplicate images in image.manual image , streams	1	447	December 6, 2021
duplicate images when annotating done , image , streams	7	1144	September 8, 2020
Seeing the same images that have already been annotated usage , image , solved	3	743	November 11, 2020
Image classification (choice) - Duplicated images image , solved	8	1695	May 16, 2019
Manual Image Annotation: Duplicate Image usage , ner	4	255	July 26, 2022

What is Prodigy's behaviour in annotating two identical images?

Related topics