Prodigy's EntiyRecognizer
model was developed specifically for Prodigy, so it's also a little more specific in terms of the input it expects.
The input hash is generated from the input data, e.g. the text or the image and lets Prodigy distinguish between tasks with the same input (but potentially different labels or spans). Additionally, Prodigy also generates a task hash based on the input hash and the features you're annotating, e.g. the spans, labels etc. This lets you distinguish between exact questions. You can also use the set_hashes
helper to take care of the hashing for you:
from prodigy import set_hashes
examples = [set_hashes(eg) for eg in examples]
You can also set the additional keyword arguments input_keys
and task_keys
, both lists of the keys you want to take into account when hashing. For example, input_keys=('text', 'custom_text')
. The full docs are available in the PRODIGY_README.html
.
Yes, but this is a little more complex. @honnibal wrote a more detailed reply on this here: