Useful scripts for Nagios
These are some of my homebrew scripts that I use with Nagios.
1. Alert based on Ganglia metrics
Ganglia is a scalable distributed monitoring system for
high-performance computing systems such as clusters and Grids. Basic Ganglia installation comes with a number of built-in metrics such as
load average, cpu utilization, disk free etc. You can also add custom metrics. Obviously it would be highly useful to be able to use
these metrics in Nagios alerting. One of the benefits is that we can even avoid installing NRPE since most of the metrics are already being
pushed via Ganglia. If you are looking for some custom metrics ;-) you can find some here.
Following script can be used as a generic Nagios plug-in check_ganglia_metric.php. Download it as e.g.
check_ganglia_metric.php and put it in your Nagios Plug-Ins directory e.g. /var/lib/nagios/plugins
- Check Ganglia Metrics script requires PHP CLI binary
- To test invoke PHP from the command line e.g. php --version or php5 --version. If you get something back you are good to go. Otherwise
- Under Ubuntu you would type
apt-get install php5-cli
- Under RedHat/Centos/Fedora type
yum install php-cli
- You have to have Ganglia Web package installed and you have to configure it (conf.php) so that it pulls data off a server that aggregates data off of
servers you want to monitor. By default Ganglia Web connects to localhost. Change following variables if your set up is different
- $ganglia_ip = "127.0.0.1";
- $ganglia_port = 8652;
- Once you have Ganglia Web installed adjust the path in the check_ganglia_metric.php script to point to it e.g. it defaults to
- Now execute the script you downloaded above ie. php check_ganglia_metric.php. You should see
Usage: check_ganglia_metric.phps <hostname> <metric> <less|more|equal|notequal> <critical_value> ie.
check_ganglia_metric.phps server1 disk_free less 10
less and more qualifiers specify whether we mark metric critical if it is less or more than critical value
- If you get something like this
PHP Warning: include_once(./version.php): failed to open stream: No such file or directory in /var/www/html/ganglia/conf.php on line 7
Modify your conf.php to include full path to the version.php include ie. include_once("/var/www/html/ganglia/version.php");
- At this point I would run an actual test e.g.
check_ganglia_metric.php server1 disk_free less 10
- If that succeeded you are now ready to configure Nagios
To configure Nagios add following check command
command_line /usr/bin/php /var/lib/nagios/plugins/check_ganglia_metric.php $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$
Now you can use it in a service check. For instance say you want to be alerted if 1-minute load average went over 5 you would add following directive
If you wanted to alert when disk space was less than 10 GB this one
Be reminded that operators indicate what should be "critical" state. For instance if you use notequal it means state is critical if the value is
Happy monitoring :-).
2. Send alerts to AOL instant
messenger users (AIM)
a) You will need to download and install the Net::AIMTOC Perl module
b) Download following script send_aim_msg.txt.
Save it as send_aim_msg.pl in e.g. /etc/nagios, make sure it has
c) Look in the file and adjust the $robotuser and $robotpassword
d) Add following to /etc/nagios/misccommands.cfg
command_line echo "Service:
$SERVICEOUTPUT<BR>Date: $LONGDATETIME$" |
command_line /usr/bin/printf "%b" "Host '$HOSTALIAS$'
is $HOSTSTATE$<BR>Info: $HOSTOUTPUT$<BR>Time:
$LONGDATETIME$" | /etc/nagios/send_aim_msg.pl <sysadminnicks>
Put all sysadminnicks you want to instant message
e) Then just add notify-by-im and host-notify-by-im to
f) Don't forget to check the Nagios syntax with /usr/sbin/nagios -v
3. Check LDAP
I was not happy with the check_ldap script that comes with Nagios so I
took the ldap.monitor from Mon which is monitoring facility somewhat
similar to Nagios (not as powerful).
a) Download check_ldap. Rename it
to e.g. check_ldap and stick it in /usr/libexec/. Make sure you it has
execute permissions and that nagios user can execute it
b) Add following to checkcommands.cfg
command_line $USER1$/check_ldap $HOSTADDRESS$ -base="dc=domain,dc=com"
-filter="uid=vuksan" -attribute=uid -value=vuksan
Modify the entries in red. Basically what the script does is connects
to LDAP server filters based on uid=vuksan and makes sure that
attribute uid has value of vuksan.
c) Attach check_ldap command to a service
4. Check host connectivity using
On occasion hosts that may not pingable through ICMP may be "pingable"
through traceroute because as man traceroute says
utilizes the IP protocol ‘time to live’ field and attempts
to elicit an ICMP TIME_EXCEEDED response from each
gateway along the path to some host.
I wrote a script to utilize that fact.
a) Download check_traceroute.
Stick it in /usr/libexec/. Make sure
you it has execute permissions and that nagios user can execute it
b) Add following to checkcommands.cfg
command_line $USER1$/check_traceroute $HOSTADDRESS$
c) Attach check_traceroute command to a service
Vladimir Vuksan E-mail me. Or you can follow me on Twitter http://twitter.com/vvuksan