Should _input_hash be required on the input to EntityRecognizer?

ines · January 9, 2018, 11:54pm

Prodigy's EntiyRecognizer model was developed specifically for Prodigy, so it's also a little more specific in terms of the input it expects.

The input hash is generated from the input data, e.g. the text or the image and lets Prodigy distinguish between tasks with the same input (but potentially different labels or spans). Additionally, Prodigy also generates a task hash based on the input hash and the features you're annotating, e.g. the spans, labels etc. This lets you distinguish between exact questions. You can also use the set_hashes helper to take care of the hashing for you:

from prodigy import set_hashes

examples = [set_hashes(eg) for eg in examples]

You can also set the additional keyword arguments input_keys and task_keys, both lists of the keys you want to take into account when hashing. For example, input_keys=('text', 'custom_text'). The full docs are available in the PRODIGY_README.html.

Yes, but this is a little more complex. @honnibal wrote a more detailed reply on this here:

Topic		Replies	Views
Annotation JSON ner , spancat	7	963	September 28, 2022
two EntityRecognizers Getting Started ner	4	183	November 28, 2023
Does prodigy.models.ner.EntityRecognizer constructor modify the underlying nlp model? usage , ner , done , solved	5	663	July 8, 2021
Avoid restarting from zero... enhancement , usage , solved	19	1982	May 10, 2018
Prodigy hashing behavior usage	13	35	July 27, 2024

Should _input_hash be required on the input to EntityRecognizer?

Related topics