Image classification (choice) - Duplicated images

ines · April 19, 2019, 9:40am

Thanks for opening this a s a separare thread – definitlely good to keep the threads focused!

This is definitely strange – it seems like tasks with the same input somehow receive different task hashes over different runs? The _input_hash is based on the value of "image", while the _task_hash takes the input hash, plus the "spans", "label", and "options" properties into account, if available.

Is there anything in your options that could possibly change between sessions? Like, when you unpickle the file with the options or something like that? Even a tiny difference would cause the task to receive the same input hash (because same image), but a different task hash – which makes Prodigy think they’re different questions.

If you know that you’re only ever going to ask one question about one image, you could also set your own hashes and base both the input hash and task hash on the value of "image", which shouldn’t change. Prodigy will accept pre-defined hashes that are already set in the stream. For example:

for task in stream:
    task = prodigy.set_hashes(task, input_keys=["image"], task_keys=["image"])
    # and so on

Topic		Replies	Views
Seeing the same images that have already been annotated usage , image , solved	3	743	November 11, 2020
Duplicate images in image.manual image , streams	1	447	December 6, 2021
Manual Image Annotation: Duplicate Image usage , ner	4	255	July 26, 2022
duplicate images when annotating done , image , streams	7	1144	September 8, 2020
Multiple Questions Per Image custom , front-end	2	1164	January 8, 2018

Image classification (choice) - Duplicated images

Related topics