Hi Prodigy Team,
We have a custom TextCat recipe that we don't want to rehash the tasks by setting the get_stream rehash parameter to False.
stream = get_stream(
source, loader=loader, rehash=False, dedup=True, input_key="text"
)
We disabled the rehash because the hashing has been done by the other service when preparing the source.jsonl.
{"_input_hash": -1512905942, "_task_hash": -972217336, "task_key": "MULTILABEL_OC_Product / Brand_0", "text": "Excellent!", "meta": {"category": "FACE", "segment": "FACE CARE", "brand": "OLAY", "product": "Olay Regenerist Whip 50ml", "rating": 5, "sentence_id": "www.superdrug.com-107254778-8001090875266-Excellent!_1", "phrase_id": "www.superdrug.com-107254778-8001090875266-Excellent!_1_0_10", "ooc": "Product / Brand", "sentence": "Excellent!", "multilabel": true, "oc": "Formula"}}
{"_input_hash": -1778974475, "_task_hash": 1591342871, "task_key": "MULTILABEL_OC_Product / Brand_0", "text": "I love so much .", "meta": {"category": "FACE", "segment": "FACE CARE", "brand": "OLAY", "product": "Olay Regenerist Whip 50ml", "rating": 5, "sentence_id": "www.superdrug.com-107254778-8001090875266-Excellent!_2", "phrase_id": "www.superdrug.com-107254778-8001090875266-Excellent!_2_0_16", "ooc": "Product / Brand", "sentence": "I love so much .", "multilabel": true, "oc": "Formula"}}
{"_input_hash": 139019224, "_task_hash": 1049464764, "task_key": "MULTILABEL_OC_Product / Brand_0", "text": "You don't need too much, a little goes a long way and it glides on really smoothly without dragging yourcskin.", "meta": {"category": "FACE", "segment": "FACE CARE", "brand": "OLAY", "product": "Olay Regenerist Whip 50ml", "rating": 5, "sentence_id": "www.superdrug.com-107254778-8001090875266-Excellent!_3", "phrase_id": "www.superdrug.com-107254778-8001090875266-Excellent!_3_0_110", "ooc": "Product / Brand", "sentence": "You don't need too much, a little goes a long way and it glides on really smoothly without dragging yourcskin.", "multilabel": true, "oc": "Formula"}}
{"_input_hash": -1512905942, "_task_hash": 75143544, "task_key": "MULTILABEL_OC_Product / Brand_1", "text": "Excellent!", "meta": {"category": "FACE", "segment": "FACE CARE", "brand": "OLAY", "product": "Olay Regenerist Whip 50ml", "rating": 5, "sentence_id": "www.superdrug.com-107254778-8001090875266-Excellent!_1", "phrase_id": "www.superdrug.com-107254778-8001090875266-Excellent!_1_0_10", "ooc": "Product / Brand", "sentence": "Excellent!", "multilabel": true, "oc": "Formula"}}
{"_input_hash": -1778974475, "_task_hash": 1758367154, "task_key": "MULTILABEL_OC_Product / Brand_1", "text": "I love so much .", "meta": {"category": "FACE", "segment": "FACE CARE", "brand": "OLAY", "product": "Olay Regenerist Whip 50ml", "rating": 5, "sentence_id": "www.superdrug.com-107254778-8001090875266-Excellent!_2", "phrase_id": "www.superdrug.com-107254778-8001090875266-Excellent!_2_0_16", "ooc": "Product / Brand", "sentence": "I love so much .", "multilabel": true, "oc": "Formula"}}
{"_input_hash": 139019224, "_task_hash": -2001281296, "task_key": "MULTILABEL_OC_Product / Brand_1", "text": "You don't need too much, a little goes a long way and it glides on really smoothly without dragging yourcskin.", "meta": {"category": "FACE", "segment": "FACE CARE", "brand": "OLAY", "product": "Olay Regenerist Whip 50ml", "rating": 5, "sentence_id": "www.superdrug.com-107254778-8001090875266-Excellent!_3", "phrase_id": "www.superdrug.com-107254778-8001090875266-Excellent!_3_0_110", "ooc": "Product / Brand", "sentence": "You don't need too much, a little goes a long way and it glides on really smoothly without dragging yourcskin.", "multilabel": true, "oc": "Formula"}}
{"_input_hash": 139019224, "_task_hash": 541961746, "task_key": "MULTILABEL_SPAN_1", "text": "You don't need too much, a little goes a long way and it glides on really smoothly without dragging yourcskin.", "meta": {"category": "FACE", "segment": "FACE CARE", "brand": "OLAY", "product": "Olay Regenerist Whip 50ml", "rating": 5, "sentence_id": "www.superdrug.com-107254778-8001090875266-Excellent!_3", "phrase_id": "www.superdrug.com-107254778-8001090875266-Excellent!_3_0_110", "ooc": "Price", "sentence": "You don't need too much, a little goes a long way and it glides on really smoothly without dragging yourcskin.", "multilabel": true, "oc": "Price"}}
{"_input_hash": -1778974475, "_task_hash": 2130407409, "task_key": "MULTILABEL_OC_Effects_0", "text": "I love so much .", "meta": {"category": "FACE", "segment": "FACE CARE", "brand": "OLAY", "product": "Olay Regenerist Whip 50ml", "rating": 5, "sentence_id": "www.superdrug.com-107254778-8001090875266-Excellent!_2", "phrase_id": "www.superdrug.com-107254778-8001090875266-Excellent!_2_0_16", "ooc": "Effects", "sentence": "I love so much .", "multilabel": true, "oc": "Beauty: Brightening / Spot reduction"}}
{"_input_hash": -1778974475, "_task_hash": 770826728, "task_key": "MULTILABEL_OC_Effects_1", "text": "I love so much .", "meta": {"category": "FACE", "segment": "FACE CARE", "brand": "OLAY", "product": "Olay Regenerist Whip 50ml", "rating": 5, "sentence_id": "www.superdrug.com-107254778-8001090875266-Excellent!_2", "phrase_id": "www.superdrug.com-107254778-8001090875266-Excellent!_2_0_16", "ooc": "Effects", "sentence": "I love so much .", "multilabel": true, "oc": "Beauty: Brightening / Spot reduction"}}
In the recipe's return, we also defined that we want to exclude by the task hash instead of input hash.
return {
"view_id": "choice" if has_options else "classification",
"dataset": dataset,
"stream": stream,
"exclude": exclude,
"validate_answer": validate_answer,
"config": {
"labels": labels,
"choice_style": "single" if exclusive else "multiple",
"choice_auto_accept": exclusive,
"exclude_by": "task",
"auto_count_stream": True,
},
}
However, it seems that the recipe still uses input hash instead of task hash to filter out duplicates when running the prodigy instance.
We expect that all of the 9 tasks will be shown to the annotators instead of 3 as in the source.json each task has a different task hash. However, only 3 are shown. It seems that the recipe uses input hash instead of task hash. I wonder whether this is a bug. Shouldn't it exclude by task hash as we have defined in the recipe configuration?
Thank you for helping us out