Hi @ines,
I'm having a problem fith the filter_input function as well, so thought I would post it here.
I'm trying to filter my stream by the _input_hash
using the filter_input
function, but I can't get it to work. I've checked the database, and the hashes are the same.
My setup is as follows:
- I've written a custom loader that dumps and print the json. I'm manually adding an
_input_hash
to the task using pythons build-in hash
method.
- I've written a custom recipe using the blocks interface
- To read from the sys.stdin, Im using the ´get_stream´ method.
@prodigy.recipe(
'task1',
dataset=('The dataset to store data in', 'positional',None, str),
source=("Data to annotate (file path or '-' to read from standard input)", "positional", None, str),
)
def task1(dataset, source):
db = connect()
input_hashes = db.get_input_hashes(dataset)
stream = prodigy.get_stream(source)
stream = add_options(stream)
stream = filter_inputs(stream,input_hashes)
return {
'stream': stream,
'dataset': dataset,
'view_id':'blocks',
'config':{
'blocks':[
{'view_id':'html','html_template':html2},
{'view_id':'choice'},
{'view_id':'html','html_template':html2},
],
"history_size":10,
"choice_style":"single",
'javascript':functions,
}
}
And here is the output from db-out. It consist of three examples, and as you can see by comparing line (1,4),(2,5),(3,6) the input hashes are the same
{"meta":{"id":"1003517020111303_1003525683443770","source":"se_facebook","reaction_count":0,"angry_count":0},"text":"Hej ","_input_hash":943287096108137100,"options":[{"id":0,"text":"Offensive"},{"id":1,"text":"Hateful"},{"id":2,"text":"Violent"},{"id":99,"text":"Hard to say"}],"_task_hash":-1290115455,"_session_id":null,"_view_id":"blocks","config":{"choice_style":"single"},"accept":[],"answer":"accept"}
{"meta":{"id":"1003517020111303_1003525763443762","source":"se_facebook","reaction_count":0,"angry_count":0},"text":"Hej","_input_hash":6904969615189831000,"options":[{"id":0,"text":"Offensive"},{"id":1,"text":"Hateful"},{"id":2,"text":"Violent"},{"id":99,"text":"Hard to say"}],"_task_hash":-1114970609,"_session_id":null,"_view_id":"blocks","accept":[],"config":{"choice_style":"single"},"answer":"accept"}
{"meta":{"id":"1003517020111303_1003525810110424","source":"se_facebook","reaction_count":0,"angry_count":0},"text":"Lyssnar","_input_hash":2672328271787085300,"options":[{"id":0,"text":"Offensive"},{"id":1,"text":"Hateful"},{"id":2,"text":"Violent"},{"id":99,"text":"Hard to say"}],"_task_hash":378550198,"_session_id":null,"_view_id":"blocks","accept":[99],"config":{"choice_style":"single"},"answer":"accept"}
{"meta":{"id":"1003517020111303_1003525683443770","source":"se_facebook","reaction_count":0,"angry_count":0},"text":"Hej ","_input_hash":943287096108137100,"options":[{"id":0,"text":"Offensive"},{"id":1,"text":"Hateful"},{"id":2,"text":"Violent"},{"id":99,"text":"Hard to say"}],"_task_hash":-1290115455,"_session_id":null,"_view_id":"blocks","config":{"choice_style":"single"},"accept":[],"answer":"accept"}
{"meta":{"id":"1003517020111303_1003525763443762","source":"se_facebook","reaction_count":0,"angry_count":0},"text":"Hej","_input_hash":6904969615189831000,"options":[{"id":0,"text":"Offensive"},{"id":1,"text":"Hateful"},{"id":2,"text":"Violent"},{"id":99,"text":"Hard to say"}],"_task_hash":-1114970609,"_session_id":null,"_view_id":"blocks","accept":[],"config":{"choice_style":"single"},"answer":"accept"}
{"meta":{"id":"1003517020111303_1003525810110424","source":"se_facebook","reaction_count":0,"angry_count":0},"text":"Lyssnar","_input_hash":2672328271787085300,"options":[{"id":0,"text":"Offensive"},{"id":1,"text":"Hateful"},{"id":2,"text":"Violent"},{"id":99,"text":"Hard to say"}],"_task_hash":378550198,"_session_id":null,"_view_id":"blocks","accept":[99],"config":{"choice_style":"single"},"answer":"accept"}
I've tried out different methods to filter, including setting the hashes in the recipe using set_hashes
, using the dedup
param which fails as the task has no _task_hash
at stream time.
Please let me know, if you need more information to help me debug this.
Thanks!