Unknown random crashes when review images


(Hao Xi) #1

I am using Prodigy to accept / reject images, I have about 26k images, the image size could be from 200kb to 15mb, the image’s frame size was set to 800x600px.

I have enabled PRODIGY_LOGGING=verbose before launching Prodigy, it does log a lot more information, however, after reviewing 1.5k images Prodigy crashed, and I didn’t see any useful information at the end of log file.

How can we diagnose a crash in Prodigy? We don’t want to restart Prodigy every 5 hours.


(Ines Montani) #2

What exactly do you mean by “crashes”? Does the web app or the server stop working? Is there any error message in the terminal or console? And when you look at the logs, can you identify the example it fails on?

(Hao Xi) #3

When I say ‘crashed’, it means the Prodigy processes were unexpected terminated for some reason, and I did enable ‘verbose’ level logs, but I didn’t see any useful information. Is there any other logs besides the console output?

(Matthew Honnibal) #4

Could it have been an out-of-memory error? That wouldn’t produce any obvious log of the problem.

(Hao Xi) #5

I don’t think so, because I am running Prodigy on our machine learning server, and it has 60GB. I am the only person who use this server, and I am pretty sure I didn’t run any training during the image review, so it should have plenty of free RAM (~50GB+).

(Hao Xi) #6

As a workaround, I am using a tool called immortal, it is a utility which will monitor and auto restart processes (including prodigy).

(Matthew Honnibal) #7

Thanks for the report, and sorry for the problem — I’m glad you have a workaround, but that’s definitely unsatisfying! We’ll try to figure out what the problem could be.

You can edit the image recipes script in your installation, or copy the file and run it as a custom recipe. You might try adding a little loop to loop over the stream, so that you can try to debug the problem.

The strange thing is that the Python logic in the image recipe really couldn’t be much simpler. On each call that the front-end makes to the get_questions endpoint, the Python service will just read a batch of images, and format them in a response.

I wonder whether the responses are sometimes above some hard limit? You can add extra debugging in the app.py to print the size of the response objects. Again, you might want to just write a little function which repeatedly calls get_questions(). It’s usually hard to debug these things via the front-end, but if you debug from Python, it should be pretty easy.