Merging datasets of same input data to combine separately annotated entities

Hi @DGMS90 ,

The merge_db command actually just concatenates the dataset - it does not merge the annotated spans on the same input_hash so what you're observing is correct.

Prodigy is only merging the annotations before training with train and exporting with data-to-spacy. These two commands also take care of resolving the conflicting annotations e.g. overlapping spans by selecting the longer one. So you might as well store your datasets separately and only merge when you're ready to train.

You can of course merge it yourself, if that's preferred. In this post Ines provides some code snippets for this that should be helpful
and some more relevant comments here.

1 Like