Hi.
I am going through the documentation and seems that this example doesn't work (at least it doesn't in my computer). My prodigy version is: 1.10.5
This is the code you post in the documentation: link to code
from prodigy.components.filters import filter_duplicates
stream = [{"text": "foo", "label": "bar"}, {"text": "foo", "label": "bar"}, {"text": "foo"}]
stream = filter_duplicates(stream, by_input=False, by_task=True)
# [{'text': 'foo', 'label': 'bar'}, {'text': 'foo'}]
stream = filter_duplicates(stream, by_input=True, by_task=True)
# [{'text': 'foo', 'label': 'bar'}]
In order to iterate over the stream and see the elements I added:
list(stream)
This throws this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-159-36d117afad89> in <module>
5
6 stream = filter_duplicates(stream, by_input=False, by_task=True)
----> 7 list(stream)
cython_src/prodigy/components/filters.pyx in filter_duplicates()
KeyError: '_task_hash'
I don't know if this is the expected output. If it is, I think it would be better a reproducible example that works well.
Thanks in advance.
Sergio M.