Hi @biagiodistefano,
Thanks for the update!
I’m not saving base64 encoded data, I swap that out with the image path in the before_db method (see recipe).
There is this extra and
condition for the existence of path
which is why I originally said that you might "potentially" be saving the encoded data. But if you are sure it's not being saved then that's out of the equation.
I have successfully deployed it on a different droplet and also managed to save some batches. Same OS, same Python environment, different specs. However the process gets killed after a while.
What was the difference in specs?
While I’m here, just out of curiosity: does prodigy try to load all the images in memory or does it load them lazily? I’m asking because I had folders with thousands of images that caused the process to get killed immediately, and only splitting them into folders with fewer images solved the problem.
Prodigy works by buffering batches of data so with a single annotator only a three batches should ever be loaded in memory (the current batch being annotated, the future batch and the already annotated batch that's being buffered for possible edits before it's saved to the DB). This might change if there are multiple annotators with feed overlap set to false as more data will have to pulled. But it is definitely not loading the entire dataset, unless you batch size is massive, of course. Not sure what was the reason to fail there, but (now legacy) Image
loader streams the files from the directory on pull basis so it should never try to load the entire folder.
I was wondering if there's anything on the frond end console perhaps? It looks like this user here had a similar issue.