Check for a failed drive in Linux software RAID


This is a handy script to check for when a disk has failed in a Linux software RAID. It works by comparing the output of /proc/mdstat when RAID array is functioning properly and current condition. First time it runs it creates an "OK_FILE" which assumes that RAID array is sound. An consecutive invocations it will check whether the output of /proc/mdstat changed. If it has it will send an e-mail to the MAILTO address. I run it from a crontab at e.g. 2 a.m. ie.

0 2 * * *      /root/check_raid.sh   >> /dev/null

To download the script click here. Otherwise cut and paste following. Change the MAILTO to your address.

#!/bin/bash

# Who should e-mails about failure go to
MAILTO='admin@domain.com'

LOG_FILE=/root/raid.log
OK_FILE=/root/raid.ok

# If OK file doesn't exist create it
if [ ! -e $OK_FILE ]; then
    cat /proc/mdstat > $OK_FILE
fi

rm -f $LOG_FILE

SYSTEM=`uname --nodename`

cat /proc/mdstat > $LOG_FILE

DIFF=`diff $OK_FILE $LOG_FILE | wc -l `

if [ $DIFF -ne 0 ]; then
        echo "RAID failed"
        mail -s "URGENT: RAID disk failure detected on $SYSTEM" $MAILTO < $LOG_FILE
else
        echo "RAID OK"
fi
exit 0

Author: Vladimir Vuksan E-mail me