I'm working through what I imagine must be a pretty standard use case:
- load in a directory full of images
- select whether the image is a thing or is not (binary classification)
- use those classifications (i.e. the image name tied to whatever the correct classification was) later on in a computer vision DL project.
I can't quite seem to get Prodigy to do what I want, though. I tried using the
mark recipe as suggested in your docs, but this doesn't quite do it. I had a few specific questions off the back of this:
I read this support topic where it is stated that images are loaded in alphabetical order. I have 65K+ images in my directory. I don't want to label them all, at least not initially. I want to label a random sample of a few hundred or a thousand to get a sense of a baseline for this collection of images. To get this, I'll want to be served the images randomly.
Q: Is there a way to have images loaded in randomly? (I reckon one way of doing this would be to rename all the files with random alphanumeric strings, though then I'd lose the original file names. Is there another way?)
I also noted that when I stopped the training (CMD-S to save the annotations to the database, quitting the process in the terminal with ctrl-C), and then restarted, it prompted me to annotate from the very beginning again, including all the images that I'd already labelled.
I saw this support/form thread. I added the setting
"feed_overlap": false into my
prodigy.json but it did nothing. I'm also not quite sure what that setting does exactly (and whether I should remove it.
Q: Is there a way to have Prodigy not ask me to relabel images that I've already labelled?
And I noticed that (as per the documentation) Prodigy is saving the actual image files themselves in the database. I wasn't quite sure why this was happening. I mainly just want to save the annotation itself, tied to the original filename.
Q: Is there a way to set
--remove-base64 when using the