Just to make sure I understand the question correctly: The main problem in your case is that you're now presented examples for annotation again, since they're considered different (due to the new naming)?
Correct @ines, that's indeed the behavior I am seeing!
To reproduce: annotate a bunch of images, export the annotations to a jsonl
annotations_with_old_naming.jsonl file, rename the underlying images (also replace the filenames in the jsonl file) and load them in again into a
db-in. If you start annotating with
new_dataset, the examples are re-presented again in the Prodigy UI.
Following your advice, I tried
from prodigy.components.loaders import JSONL
# before executing the following, I edit the jsonl file so that it contains the new filenames
jsonl_stream = JSONL("annotations_with_old_naming.jsonl")
examples = [prodigy.set_hashes(eg, overwrite=True) for eg in jsonl_stream]
jsonl_data = '\n'.join([json.dumps(line) for line in examples])
with open("annotations_rehashed.jsonl", "w") as f:
that seems to have done the trick, as when doing
python -m prodigy db-in new_dataset annotations_rehashed.jsonl
and restarting annotating
image.manual the already-annotated images don't show up anymore.