Loading Multiple Files for ner.teach

By default, Prodigy doesn’t make any assumptions about this and will let you re-annotate the same task. But you can tell it to exclude annotations of existing datasets by setting --exclude dataset_name (or multiple, comma-separated names). This is also very useful when creating evaluation sets.

The tasks are compared bashed on their hahes. When a new annotation task comes in, Prodigy assigns an "_input_hash" to the task, based on its content – by default, properties like "text". When you run ner.teach, Prodigy will add "spans" to each task containing the entity you’re annotating. The input hash and the annotation features are then hashed again to create a _task_hash, which is used to determine whether two annotation tasks are the same.

This means that Prodigy will exclude tasks asking the same questions – but still allow different questions about the same text that you haven’t answered before. You can find more details on the hashing in the PRODIGY_README.html, for example in the API docs of the set_hashes helper function.

1 Like