How to merge data from ner.correct and ner.teach?

ines · November 9, 2020, 1:34am

Hi, I hope I understand your question correctly! When you run prodigy train, the examples in the dataset will be merged to reflect the unique examples, and all annotations that are available for a given example will be combined to create the final training example.

However, mixing annotations of different types (binary and manual) in the same dataset can sometimes lead to unexpected results and means you won't be able to update the model as effectively: to train from binary yes/no questions, you want to update differently and consider the rejected answers, while also treating all unannotated tokens as unknown. This is done when you set --binary on prodigy train. If you train from complete gold-standard annotations created with ner.correct, you typically want to consider all unannotated tokens as non-entity tokens, which makes it easier for the model to learn. So we typically recommend keeping those types of annotation separate.

So one option would be to just use the metadata of the exported annotations to separate them into two sets and then re-import the data. Also see this thread for more details:

Topic		Replies	Views
Merging annotations from different datasets usage , ner , database , solved	12	5872	May 28, 2019
ner.train number of examples usage , ner	8	1941	August 3, 2018
Prodigy Annoation: Best Practise usage , ner , solved	3	404	February 18, 2022
using merge_spans to combine manual NER spans of different entities in different sessions ner	1	858	March 21, 2020
Data annotation : Query Regarding Data Annotation and Merging in Prodigy ner	1	18	January 10, 2025

How to merge data from ner.correct and ner.teach?

Related topics