Span annotation with ner.manual -- how to make use of ner.teach

ines · December 3, 2019, 11:48am

I think you somehow ended up with slightly messy datasets that mix annotations of different types and from different processes. Ideally, you want to create a separate dataset for separate annotation experiments. If you mix annotations from say, ner.manual (fully manual, all entities gold-standard, no missing values) with ner.teach (binary, only one span at a time, all other tokens missing values), and put them all in the same set, you won't be able to train a useful model with that because there's no way to tell which examples are gold-standard and which aren't, and you might even have a buch of conflicts.

I'd recommend just exporting the data you have, go through it in the JSON file or using a Python script and see if you can clean it up a bit. The _view_id of each record will tell you the ID of the annotation interface, so you probably want to separate examples created with ner(binary) and ner_manual (manual). Each example will also have an _input_hash so you can identify annotations created on the same input text. You can also call prodigy.set_hashes(examplee, overwrite=True) on each example to make sure you have no stale hashes, and then use the _task_hash to find duplicates.

Topic		Replies	Views
ner.train number of examples usage , ner	8	1953	August 3, 2018
Best strategy for training an NER engine usage , ner	8	2185	December 27, 2017
Understanding ner.batch-train stats usage , ner , solved , best-practices	7	2711	October 26, 2018
Two Questions on Teach recipes usage , ner , textcat , solved	5	748	January 27, 2020
Trying to train module with Ner.manual after ner.batch-train results are not perfect usage , ner	5	2905	February 3, 2018

Span annotation with ner.manual -- how to make use of ner.teach

Related topics