Merging datasets of same input data to combine separately annotated entities

magdaaniol · February 14, 2025, 4:02pm

The merge_db command actually just concatenates the dataset - it does not merge the annotated spans on the same input_hash so what you're observing is correct.

Prodigy is only merging the annotations before training with train and exporting with data-to-spacy. These two commands also take care of resolving the conflicting annotations e.g. overlapping spans by selecting the longer one. So you might as well store your datasets separately and only merge when you're ready to train.

You can of course merge it yourself, if that's preferred. In this post Ines provides some code snippets for this that should be helpful
and some more relevant comments here.

Topic		Replies	Views
using merge_spans to combine manual NER spans of different entities in different sessions ner	1	863	March 21, 2020
Merging annotations from different datasets usage , ner , database , solved	12	5879	May 28, 2019
Merging/adding data from different texts usage , ner , database	2	878	March 1, 2019
Merging annotation models? usage , ner , solved	2	740	August 4, 2019
Data annotation : Query Regarding Data Annotation and Merging in Prodigy ner	1	18	January 10, 2025

Merging datasets of same input data to combine separately annotated entities

Related topics