Difference between Input hash and task hash

daniyalSelani · July 22, 2020, 11:13am

I have been working with the prodigy db, and im confused as to what the difference is between _input_hash and _task_hash which are assigned to each task.
I couldnt find anything satisfactory in the documentation.
Can you please help?

ines · July 22, 2020, 11:40am

Hi! You can read more about the hashing here: https://prodi.gy/docs/api-loaders#hashing

A simple example: you might be annotating data for text classification with two categories, so you make two passes over the data, one with LABEL_A and one with LABEL_B. Each time, you accept/reject whether the label applies. This means you end up with two examples for each text: one with LABEL_A and one with LABEL_B. Both examples will have the same input hash, because they're questions about the same input data, the text. But they will have different task hashes, because they're different questions.

{"text": "Text", "label": "LABEL_A", "_input_hash": 1, "_task_hash": 2}
{"text": "Text", "label": "LABEL_B", "_input_hash": 1, "_task_hash": 3}
{"text": "Other", "label": "LABEL_A", "_input_hash": 2, "_task_hash": 4}

Using the input hashes and task hashes, Prodigy (or you) can also figure out whether two annotations are on the same data and use this information to merge your examples later on. For example, data-to-spacy will group all annotations with the same input hash together, so you'll get one example annotated with all categories, entities, POS tags or whatever else you labelled.

Topic		Replies	Views
Logic behind hash keys (in relation to REVIEW API)	4	16	October 16, 2024
set_hashes unpredicted behaviour usage , solved	3	554	November 9, 2020
What’s the difference between an example and a task in Prodigy?	5	318	June 28, 2022
Inconsistent hashing usage , solved , streams	2	516	December 15, 2020
Task hash of ner.make-gold and ner.silver-to-gold should be same? usage , ner , solved	4	488	September 2, 2019

Difference between Input hash and task hash

Related topics