we have a problem with duplicates. We used ner.manual for annotation and after a couple thousand annotations (about 6,000), changed the annotation schema, which is why we wanted to go over those 6,000 examples again. We did this by exporting the jsonl with db-out and using it as inputfile for the revised annotations (again using ner.manual but saving into a new empty database
Now, we exported this new
db_2 and realized that it now counts 12,000 annotations, i.e. every annotation exists twice, even with the same input_hash and task_hash. How could that happen, and can we do something directly in prodigy to fix this issue? And how do we know, which examples are the revised ones, is it the second half of annotations, i.e. annotations number 6001 - 12,000?
Thanks in advance!