Exclude flag in custom recipe not excluding examples

I'm having an issue where prodigy continues to present examples that I've already evaluated even though I am using the exclude flag. My example is a bit more complicated but I've reproduced with a simple recipe:

@prodigy.recipe(
    "my-simple-recipe",
    dataset=prodigy.recipe_args["dataset"],
    exclude=prodigy.recipe_args["exclude"],
)
def my_simple_recipe(dataset, exclude):
    
    options = [
        {"id": cat, "text": cat}
        for cat in ["a","b","c"]
    ]

    def load_stream():
        for i in range(100):
            task = {
                "text": f"{i} is the {i}th element",
                "options": options
            }
            yield task

    return {        
        "view_id": "choice",
        "stream": load_stream(),
        "exclude": exclude,
        "dataset": dataset,
    }

I run the recipe like so:

prodigy my-simple-recipe test_data --exclude test_data -F simple_recipe.py

Then I categorize the first few examples. A db-out of test_data shows:

{"text":"0 is the 0th element","options":[{"id":"a","text":"a"},{"id":"b","text":"b"},{"id":"c","text":"c"}],"_input_hash":-1544660227,"_task_hash":2109103467,"_session_id":null,"_view_id":"choice","config":{"choice_style":"single"},"accept":["a"],"answer":"accept"}
{"text":"1 is the 1th element","options":[{"id":"a","text":"a"},{"id":"b","text":"b"},{"id":"c","text":"c"}],"_input_hash":-1462450044,"_task_hash":1917332331,"_session_id":null,"_view_id":"choice","accept":["b"],"config":{"choice_style":"single"},"answer":"accept"}
{"text":"2 is the 2th element","options":[{"id":"a","text":"a"},{"id":"b","text":"b"},{"id":"c","text":"c"}],"_input_hash":-580414110,"_task_hash":1095989901,"_session_id":null,"_view_id":"choice","accept":["c"],"config":{"choice_style":"single"},"answer":"accept"}

Now, if I come back and run the recipe again, I'd expect to see the text "3 is the 3th element" but instead I see "0 is the 0th element" again. I've confirmed the input and task hash for this example is exactly the same as what is already in the dataset by using set_hashes as described here. Note that I also tried using set_hashes inside the stream generator per the example but that didn't help either. I'm confused why it's not excluding them as expected.

Any guidance on how to make this work? Thank you.

Hi! Which version of Prodigy are you using? Can you upgrade to the latest or otherwise, check if setting "feed_overlap": false in your config solves the problem?

Currently 1.10.4, just updated to 1.10.5 and now it works as expected (did not set "feed_overlap": false as I see in the release notes its default value is false).

Thank you!

1 Like