Server Problems : Update

 

hard-disk-42935_640This is a follow on from my server problems post from yesterday…

Regarding the general issue, misiaq came up with a great suggestion, which was to use watchdog. It’s not going to “fix” anything, but if I get a reboot when the general issue happens, that would be much better than having the server sit idle for 5 hours until I  wake up. 🙂 Let’s see how that works out…

Praveen asked if I use any tools like Webmin. The answer is yes and no. Just like my use of any tool (Cloud Control, SQL Developer etc.) I use a combination of command line and tools. I usually find command line more useful as I can script and reuse, but I always have tools available to fill in the gaps and provide inspiration. I don’t always invest enough time in learning the tools well, which is why useful bits of them pass me by on occasion, but I also don’t like to become dependent on tools. In the case of Webmin, it is installed on the server, but it is not exposed to the outside world. I have to tunnel in to use it, so during a problem, when I can’t SSH to the server, Webmin is not available. 🙂

Back to the specific issue from yesterday…

During my normal checks I noticed my RAID1 setup looked like this.

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[1]
      970470016 blocks [2/1] [_U]

md1 : active raid1 sdb1[1]
      4194240 blocks [2/1] [_U]

unused devices: <none>
#

Last time it looked like this, one of the hard drives had died, so I contacted the hosting company to get it sorted. After a couple of false starts, they eventually took the machine offline, tested it and said the hard drives were fine. 🙁

I added the partitions from the “/dev/sda” disk back into the RAID config. I guess I should have tried that first. 🙂

# mdadm /dev/md1 -a /dev/sda1
mdadm: added /dev/sda1
# mdadm /dev/md3 -a /dev/sda3
mdadm: added /dev/sda3
#

After it had finished rebuilding it looked like this.

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda3[0] sdb3[1]
 970470016 blocks [2/2] [UU]

md1 : active raid1 sda1[0] sdb1[1]
 4194240 blocks [2/2] [UU]

unused devices: <none>
#

So it looks like the drive just dropped out of the RAID config for no reason… Does that happen?

As I said before, I’m not a system administrator. I just know enough to be dangerous. 🙂 Thanks for the comments from yesterday. They have been very helpful…

Cheers

Tim…

Author: Tim...

DBA, Developer, Author, Trainer.

2 thoughts on “Server Problems : Update”

  1. Hi Tim.
    I’m not big guy with Sysytem Admin things but I do know the basic things through training.
    That “mdstat” actually did not strike in my mind but “mdadm -D /dev/mdx ” did strike in my mind.
    Thanks for that “mdstat” !

Comments are closed.