Duplicate images in image.manual


first of all, great software. I just started using prodigy and I'm really liking it. Thanks.

I know that there are plenty of other threads about duplicate examples, but somehow none solved my problem. It seems that they mostly are concerned with complicated multi-user situations, whereas mine seems much simpler and is probably just a big misunderstanding.

My situation:
I have a clean prodigy installation (v1.11.6) and hence an empty prodigy.json. So no funny business there. My task: I am given a folder containing 50 images that I need to label (bounding boxes of objects of class A).

prodigy image.manual 'test' test/ --label A --width 2000 --remove-base64

works like a charm and after 50 images I get "no task left", everything is fine and labelled. However, when I stop and save my annotations halfway trough and come back the next day to continue from there (by just repeating the same call as above) I am presented images I already labelled and labelling them again will result in duplicate entries in the database (by that I mean identical hashes and everything).

Ideally I should only see examples that are not already annotated, and I thought that would be the case, given that "auto_exclude_current" should default to true (setting it explicitly to true in the prodigy.json does not change the behavior).

Is there anything I am missing? Why do I see images that I already labelled although the parameter is set? Also, in general, is there a manual of some sorts that explains the concepts (or design principles) behind prodigy?

I hope I didn't miss any important information.

Thanks in advance,

Thanks for the detailed report – this definitely sounds strange, especially considering that your workflow is super straightforward. The exlusion logic is actually pretty straightforward as well: if a hash is already in the database, an example will be skipped. If not, you will see it again. So if the hashes don't change (e.g. if the files change), you shouldn't see it again.

We'll definitely investigate this and see if we can somehow reproduce the problem!

The hashing and deduplication is explained here: https://prodi.gy/docs/api-loaders#hashing