Hi @DGMS90 ,
The merge_db
command actually just concatenates the dataset - it does not merge the annotated spans on the same input_hash
so what you're observing is correct.
Prodigy is only merging the annotations before training with train
and exporting with data-to-spacy
. These two commands also take care of resolving the conflicting annotations e.g. overlapping spans by selecting the longer one. So you might as well store your datasets separately and only merge when you're ready to train.
You can of course merge it yourself, if that's preferred. In this post Ines provides some code snippets for this that should be helpful
and some more relevant comments here.