Hi! The exclude
returned by your recipe should in theory take care of this, yes In your case, the filter_tasks
is not going to make a difference, unless the incoming examples from file_path
already include hashes. If not, there's nothing to compare against. So if you want to handle the filtering in your recipe, you also want to assign the hashes yourself:
hashed_stream = (prodigy.set_hashes(eg) for eg in stream)
Also double-check that nothing in your examples is changing between sessions. For example, if you're adding different options before hashing examples, the hashes are going to reflect that. So the same text with different options will receive different task hashes, which means Prodigy will treat those like different questions (which makes sense).
See this thread for a similar use case and more details: