Difference between Input hash and task hash

ines · July 22, 2020, 11:40am

Hi! You can read more about the hashing here: https://prodi.gy/docs/api-loaders#hashing

A simple example: you might be annotating data for text classification with two categories, so you make two passes over the data, one with LABEL_A and one with LABEL_B. Each time, you accept/reject whether the label applies. This means you end up with two examples for each text: one with LABEL_A and one with LABEL_B. Both examples will have the same input hash, because they're questions about the same input data, the text. But they will have different task hashes, because they're different questions.

{"text": "Text", "label": "LABEL_A", "_input_hash": 1, "_task_hash": 2}
{"text": "Text", "label": "LABEL_B", "_input_hash": 1, "_task_hash": 3}
{"text": "Other", "label": "LABEL_A", "_input_hash": 2, "_task_hash": 4}

Using the input hashes and task hashes, Prodigy (or you) can also figure out whether two annotations are on the same data and use this information to merge your examples later on. For example, data-to-spacy will group all annotations with the same input hash together, so you'll get one example annotated with all categories, entities, POS tags or whatever else you labelled.

Topic		Replies	Views
Logic behind hash keys (in relation to REVIEW API)	4	17	October 16, 2024
set_hashes unpredicted behaviour usage , solved	3	556	November 9, 2020
What’s the difference between an example and a task in Prodigy?	5	318	June 28, 2022
Inconsistent hashing usage , solved , streams	2	516	December 15, 2020
Task hash of ner.make-gold and ner.silver-to-gold should be same? usage , ner , solved	4	488	September 2, 2019

Difference between Input hash and task hash

Related topics