When running the db-out command on a dataset containing a large number of images the command is killed and no data is written to the specified file.
$ prodigy db-out insurance_img_annotation > insurance_img_annotation.jsonl
/home/COMPUTE/eadkins/anaconda3/bin/prodigy: line 1: 8221 Killed python -m prodigy "$@"
I realized (belatedly) that I was writing the actual image data to the task. I thought this was likely to be the problem and I am working on changing this for future jobs. (db-out works fine for large sets containing only text data). But for this one dataset, how can I recover my annotations?
This seems to be a memory issue. I copied the entire database
prodigy.db from the .prodigy/ directory to another computer with more memory available and was able to run db-out successfully. This seems like a rather clunky workaround, but at least I have the annotations.
Thanks for updating with your solution!
Even though it’s clunky, one thing I like about SQLite is that it gives you one straightforward file that you can back up and move around easily. Btw, if you’re ever in a situation where you want to change things manually (e.g. remove the encoded image data), you could also give the SQLite DB browser a try. Just make sure to back up the
.db file before.