I annotated about 400 images and ignored about 500 images and saved the annotations to a dataset called coats-jackets-4200. Today I woke up and wanted to continue labeling my dataset and launched prodigy again on my pre-existing dataset (coats-jackets-4200). I'm seeing a bunch of images that I have already labeled... what is the solution to this? Or am I doing something wrong?
I've read that this should be handled automatically by Prodigy. Is there something wrong with the hashes? I haven't changed anything in the recipe or elsewhere since restarting the server.
I have tried to implement a solution I found on here that filters the stream based on the input hashes but it is not working.
@prodigy.recipe("classify-images")
def classify_images(dataset, source):
# In your recipe function
db = connect()
input_hashes = db.get_input_hashes(dataset)
def get_stream():
# Load the directory of images and add options to each task
stream = Images(source)
stream = filter_inputs(stream, input_hashes)
for eg in stream:
eg["options"] = OPTIONS
yield eg
return {
"dataset": dataset,
"stream": get_stream(),
"view_id": "choice",
"config": {
"choice_style": "single", # or "multiple"
# Automatically accept and submit the answer if an option is
# selected (only available for single-choice tasks)
"choice_auto_accept": False
}
}
I identified an image that was presented and annotated twice, exported my annotations with db-out
and then looked at the input hash and task hash for them.
...they are exactly the same
Prodigy still repeats images although this is meant to be avoided automatically and still repeats them even with the additional filtering logic that you can see in my recipe above taken from here.
Every time I restart the annotation server it shows me the same images in the exact same order, completely ignoring the task_hash and input_hash.
I have tried creating a new dataset, annotating the first example, closing the server, restarting the server on the same dataset and I am served the same initial image and both annotations get written to the output file. Really messing with my workflow.
Prodigy 1.10.4
Ubuntu 20.04