Filtering previously annotated images not working

Hey I am having an issue filtering out previously annotated images. I'm using the image manual and choice UI. Here is what I've tried using Version 1.11.8

(1) This is in my custom recipe stream function utilizing the filter_inputs component

  def get_stream():
        input_hashes = db.get_input_hashes(dataset)
        stream = filter_inputs((fetch_media(JSONL(source), ["image"], skip=True)), input_hashes)

Separated the stream defs and still didn't work.

    def get_stream():
        input_hashes = db.get_input_hashes(dataset)
        stream = fetch_media(JSONL(source), ["image"], skip=True)
        stream = filter_inputs(stream, input_hashes)

(2) I also attempted to use the exclude feature via terminal

prodigy image2.manual People -F ./image2_manual_choice.py --loader jsonl ./default.jsonl --label Person --remove-base64 --exclude People

(3) Then I added exclude_by to the config section of the recipe. When I replaced "exclude": exclude with "exclude": dataset I got an error. So i just added the exclude_by to the config section. Here is the entire return section

return {
        
        "view_id": "blocks",  # Annotation interface to use
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": get_stream(),  # Incoming stream of examples
        "before_db": before_db if remove_base64 else None,
        "exclude": exclude,  # List of dataset names to exclude
        "config": {  # Additional config settings, mostly for app UI
            "blocks": blocks,
            "label": ", ".join(label) if label is not None else "all",
            "labels": label,  # Selectable label options,
            "darken_image": 0.2 if darken else 0,
            "show_bounding_box_center": True,
            "show_bounding_box_size": True,
            "choice_style": "single", 
            "show_stats": True,
            "exclude_by": "input"
        },

(4) I set feed overlap to false in the prodigy.json

"feed_overlap": false,

I double checked the input hashes and they are all uniqie with no duplicates

{-235569408, 738755073, 1839968642, -1119228798, -1190063994, 700069767, -167115513, -1644946036, 1493465997, 146816014, -744640628, -1419850100, 391032081, 982284305, -1267927278, -1483868138, 2000863766, -1572823395, 455247389, -361435228, -804305371, 1227481510, 1248603173, -188300123, 1380958762, -2014770132, 1101691436, -1731878991, -1028536013, -1976674506, 199000886, -85850824, -77778375, -1104109509, 2093823932, -2141856195, 1479689023, -448897342, 491288259, 1420571076, -1717574076, -1952567689, 1715495880, -617941175, 1968186705, -911627567, -1266395183, -1779998121, -571232041, 324505559, 835637210, 1250949595, -1829150884, 1944036953, 736088695, 1372305384, -1234183829, 1161711212, 573638252, -1540926098, 436132336, -1924832395, -684408969, 1343556345}

It's probably something simple that I'm missing here. Anything I should look at to filter out already annotated images?

Hi @c00lcoder,

thank you for your question.

I was able to reproduce your issue, and I solutions was to use the prodigy function set_hashes (see: https://prodi.gy/docs/api-components#set_hashes) before filter_inputs and after loading the data:

stream = (set_hashes(eg) for eg in stream)
stream = filter_inputs(stream, input_hashes)

Maybe this already solves your problem too? If not, could you send the recipe you are using and an example of your input data?

2 Likes

Thanks, worked like a charm!!

1 Like