Filtering previously annotated images not working

c00lcoder · August 31, 2022, 12:46am

Hey I am having an issue filtering out previously annotated images. I'm using the image manual and choice UI. Here is what I've tried using Version 1.11.8

(1) This is in my custom recipe stream function utilizing the filter_inputs component

  def get_stream():
        input_hashes = db.get_input_hashes(dataset)
        stream = filter_inputs((fetch_media(JSONL(source), ["image"], skip=True)), input_hashes)

Separated the stream defs and still didn't work.

    def get_stream():
        input_hashes = db.get_input_hashes(dataset)
        stream = fetch_media(JSONL(source), ["image"], skip=True)
        stream = filter_inputs(stream, input_hashes)

(2) I also attempted to use the exclude feature via terminal

prodigy image2.manual People -F ./image2_manual_choice.py --loader jsonl ./default.jsonl --label Person --remove-base64 --exclude People

(3) Then I added exclude_by to the config section of the recipe. When I replaced "exclude": exclude with "exclude": dataset I got an error. So i just added the exclude_by to the config section. Here is the entire return section

return {
        
        "view_id": "blocks",  # Annotation interface to use
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": get_stream(),  # Incoming stream of examples
        "before_db": before_db if remove_base64 else None,
        "exclude": exclude,  # List of dataset names to exclude
        "config": {  # Additional config settings, mostly for app UI
            "blocks": blocks,
            "label": ", ".join(label) if label is not None else "all",
            "labels": label,  # Selectable label options,
            "darken_image": 0.2 if darken else 0,
            "show_bounding_box_center": True,
            "show_bounding_box_size": True,
            "choice_style": "single", 
            "show_stats": True,
            "exclude_by": "input"
        },

(4) I set feed overlap to false in the prodigy.json

"feed_overlap": false,

I double checked the input hashes and they are all uniqie with no duplicates

{-235569408, 738755073, 1839968642, -1119228798, -1190063994, 700069767, -167115513, -1644946036, 1493465997, 146816014, -744640628, -1419850100, 391032081, 982284305, -1267927278, -1483868138, 2000863766, -1572823395, 455247389, -361435228, -804305371, 1227481510, 1248603173, -188300123, 1380958762, -2014770132, 1101691436, -1731878991, -1028536013, -1976674506, 199000886, -85850824, -77778375, -1104109509, 2093823932, -2141856195, 1479689023, -448897342, 491288259, 1420571076, -1717574076, -1952567689, 1715495880, -617941175, 1968186705, -911627567, -1266395183, -1779998121, -571232041, 324505559, 835637210, 1250949595, -1829150884, 1944036953, 736088695, 1372305384, -1234183829, 1161711212, 573638252, -1540926098, 436132336, -1924832395, -684408969, 1343556345}

It's probably something simple that I'm missing here. Anything I should look at to filter out already annotated images?

Jette16 · August 31, 2022, 6:58am

Hi @c00lcoder,

thank you for your question.

I was able to reproduce your issue, and I solutions was to use the prodigy function set_hashes (see: https://prodi.gy/docs/api-components#set_hashes) before filter_inputs and after loading the data:

stream = (set_hashes(eg) for eg in stream)
stream = filter_inputs(stream, input_hashes)

Maybe this already solves your problem too? If not, could you send the recipe you are using and an example of your input data?

c00lcoder · August 31, 2022, 9:59am

Thanks, worked like a charm!!

Topic		Replies	Views
Seeing the same images that have already been annotated usage , image , solved	3	743	November 11, 2020
Filter already annotated text usage , solved , streams	2	614	December 27, 2021
Exclude for custom_recipes - what am I missing? usage , done , solved	7	1964	July 29, 2020
Combining Image_Manual, Image or Custom Blocks UI with Choice logic usage , image	2	338	August 8, 2022
Image classification (choice) - Duplicated images image , solved	8	1695	May 16, 2019

Filtering previously annotated images not working

Related topics