We had a problem with one of our application servers on Friday. There was a long trail of breadcrumbs, which I followed all the way back to a blindingly obvious problem. We had run out of disk space. Doh!
Why did this happen? There were two contributing factors:
- Unknown to me, marketing had requested we stop cleaning up the Apache access logs about two years ago, so they had been commented out of the automated cleanup script. Over that period, they had built up to over 7.5G.
- The tool for monitoring disk space usage was configured to send warning messages out via the wrong mail server.
Well, you have to chalk it down to experience, but it has served to remind me once again that often it is the most simple and obvious things that cause problems. Sometimes, the more you know, the easier it is to lose sight of the obvious.
Cheers
Tim…