Useful scripts for Nagios




These are some of my homebrew scripts that I use with Nagios.

1. Alert based on Ganglia metrics

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. Basic Ganglia installation comes with a number of built-in metrics such as load average, cpu utilization, disk free etc. You can also add custom metrics. Obviously it would be highly useful to be able to use these metrics in Nagios alerting. One of the benefits is that we can even avoid installing NRPE since most of the metrics are already being pushed via Ganglia. If you are looking for some custom metrics ;-) you can find some here.

Following script can be used as a generic Nagios plug-in check_ganglia_metric.php. Download it as e.g. check_ganglia_metric.php and put it in your Nagios Plug-Ins directory e.g. /var/lib/nagios/plugins

Installation

  1. Check Ganglia Metrics script requires PHP CLI binary
  2. You have to have Ganglia Web package installed and you have to configure it (conf.php) so that it pulls data off a server that aggregates data off of servers you want to monitor. By default Ganglia Web connects to localhost. Change following variables if your set up is different
  3. Once you have Ganglia Web installed adjust the path in the check_ganglia_metric.php script to point to it e.g. it defaults to
    $GANGLIA_WEB=/var/www/html/ganglia
  4. Now execute the script you downloaded above ie. php check_ganglia_metric.php. You should see

Usage/Nagios Configuration

To configure Nagios add following check command
define command{
        command_name    check_ganglia
        command_line  	/usr/bin/php /var/lib/nagios/plugins/check_ganglia_metric.php $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$
        }
Now you can use it in a service check. For instance say you want to be alerted if 1-minute load average went over 5 you would add following directive
        check_command			check_ganglia!load_one!more|5
If you wanted to alert when disk space was less than 10 GB this one
        check_command			check_ganglia!disk_free!less|10
Be reminded that operators indicate what should be "critical" state. For instance if you use notequal it means state is critical if the value is NOT equal. etc.

Happy monitoring :-).


2. Send alerts to AOL instant messenger users (AIM)

a) You will need to download and install the Net::AIMTOC Perl module from CPAN
b) Download following script send_aim_msg.txt. Save it as send_aim_msg.pl in e.g. /etc/nagios, make sure it has executable permissions.
c) Look in the file and adjust the $robotuser and $robotpassword variables
d) Add following to /etc/nagios/misccommands.cfg

define command{
        command_name    notify-by-im
        command_line    echo "Service: $SERVICEDESC$<BR>Host: $HOSTNAME<BR>Address: $HOSTADDRESS<BR>State: $SERVICESTATE<BR>Info: $SERVICEOUTPUT<BR>Date: $LONGDATETIME$" | /etc/nagios/send_aim_msg.pl <sysadminnicks>
        }

define command{
        command_name    host-notify-by-im
        command_line    /usr/bin/printf "%b" "Host '$HOSTALIAS$' is $HOSTSTATE$<BR>Info: $HOSTOUTPUT$<BR>Time: $LONGDATETIME$" | /etc/nagios/send_aim_msg.pl <sysadminnicks>
        }

Put all sysadminnicks you want to instant message

e) Then just add notify-by-im and host-notify-by-im to

define contact{
        contact_name                    vuksan
        alias                           Vladimir Vuksan
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,r
        service_notification_commands   notify-by-email,notify-by-im
        host_notification_commands      host-notify-by-email,host-notify-by-im
        email                           vuksan@domain.commer
        }

f) Don't forget to check the Nagios syntax with /usr/sbin/nagios -v /etc/nagios/nagios.cfg

3. Check LDAP

I was not happy with the check_ldap script that comes with Nagios so I took the ldap.monitor from Mon which is monitoring facility somewhat similar to Nagios (not as powerful).

a) Download check_ldap. Rename it to e.g. check_ldap and stick it in /usr/libexec/. Make sure you it has execute permissions and that nagios user can execute it

b) Add following to checkcommands.cfg

# Check LDAP
define command{
        command_name    check_ldap
        command_line    $USER1$/check_ldap $HOSTADDRESS$ -base="dc=domain,dc=com" -filter="uid=vuksan" -attribute=uid -value=vuksan
        }

Modify the entries in red. Basically what the script does is connects to LDAP server filters based on uid=vuksan and makes sure that attribute uid has value of vuksan.

c) Attach check_ldap command to a service

4. Check host connectivity using traceroute

On occasion hosts that may not pingable through ICMP may be "pingable" through traceroute because as man traceroute says

Traceroute utilizes the IP protocol ‘time to live’ field and attempts to elicit an ICMP TIME_EXCEEDED response from  each  gateway  along the path to some host.
 
I wrote a script to utilize that fact.

a) Download check_traceroute. Stick it in /usr/libexec/. Make sure you it has execute permissions and that nagios user can execute it
b) Add following to checkcommands.cfg

define command{
        command_name    check_traceroute
        command_line    $USER1$/check_traceroute $HOSTADDRESS$
        }

c) Attach check_traceroute command to a service

Author: Vladimir Vuksan E-mail me. Or you can follow me on Twitter http://twitter.com/vvuksan