Loading Multiple Files for ner.teach

ines · February 1, 2018, 8:56pm

By default, Prodigy doesn’t make any assumptions about this and will let you re-annotate the same task. But you can tell it to exclude annotations of existing datasets by setting --exclude dataset_name (or multiple, comma-separated names). This is also very useful when creating evaluation sets.

The tasks are compared bashed on their hahes. When a new annotation task comes in, Prodigy assigns an "_input_hash" to the task, based on its content – by default, properties like "text". When you run ner.teach, Prodigy will add "spans" to each task containing the entity you’re annotating. The input hash and the annotation features are then hashed again to create a _task_hash, which is used to determine whether two annotation tasks are the same.

This means that Prodigy will exclude tasks asking the same questions – but still allow different questions about the same text that you haven’t answered before. You can find more details on the hashing in the PRODIGY_README.html, for example in the API docs of the set_hashes helper function.

Topic		Replies	Views
Does Prodigy allow loading all files from a filepath usage , solved	2	2476	March 9, 2018
Create Custom Loader usage , ner , custom	21	3870	August 14, 2019
ner.teach keeps loading in the interface usage , solved	4	636	November 1, 2018
Best strategy for training an NER engine usage , ner	8	2175	December 27, 2017
.txt Source Loader for ner.teach usage , solved , streams	7	647	March 26, 2020

Loading Multiple Files for ner.teach

Related topics