see yourself for more: collectd features
#Hostname "localhost"
FQDNLookup true
LoadPlugin network
LoadPlugin rrdtool
<Plugin network>
Listen "217.19.46.22"
Listen "2002:d913:2e16::1"
</Plugin>
<Plugin rrdtool>
DataDir "/var/lib/collectd/rrd"
</Plugin>
#Hostname "localhost"
FQDNLookup true
LoadPlugin network
LoadPlugin cpu
LoadPlugin df
<Plugin network>
Server "rerun.lefant.net"
</Plugin>
enable using
sudo cp /usr/share/doc/collectd/examples/collection.cgi \
/usr/lib/cgi-bin/collection.cgi
and adapt webserver as needed.
demo url: http://rerun.lefant.net/cgi-bin/collection.cgi
just for fun:
***** Nagios *****
Notification Type: PROBLEM
Service: LOAD
Host: rerun.lefant.net
Address: rerun.lefant.net
State: CRITICAL
Date/Time: Thu Dec 4 17:20:48 CET 2008
Additional Info:
(Service Check Timed Out)
commands
timeperiods
define host{
use generic-host-bundled
host_name rerun.lefant.net
address rerun.lefant.net
parents odyssey.lefant.net
}
define service{
use generic-service-bundled
hosts www1,www2
service_description HTTP
check_command check_http
}
very easy to extend, just put your script there or collectd-nagios (debian version buggy, see end)
define command{
command_name check_collectd
command_line /usr/local/bin/collectd-nagios \
-s /var/run/collectd-unixsock -H $HOSTNAME$ \
-n $ARG1$ -d $ARG2$ -w $ARG3$ -c $ARG4$
}
define command{
command_name check_collectd_percentage
command_line /usr/local/bin/collectd-nagios \
-s /var/run/collectd-unixsock -H $HOSTNAME$ \
-n $ARG1$ -g percentage -d $ARG2$ -d $ARG3$ \
-w $ARG4$ -c $ARG5$
}
define default properties via templates:
define host{
name generic-host
notifications_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
max_check_attempts 10
notification_interval 0
notification_period 24x7
notification_options d,u,r
contact_groups admins
register 0
}
this can also be inherited and customized:
define host {
use generic-host
name generic-host-bundled
register 0
contact_groups lefant
}
works fine for things that can be configured for hosts.
however hosts cannot configure services to be monitored…
define host{
use generic-host-bundled
host_name rerun.lefant.net
address rerun.lefant.net
parents odyssey.lefant.net
services HTTP,SMTP # doesn't work!!!
}
define service{
use generic-service-bundled
hosts www1,www2
service_description HTTP
check_command check_http
}
define service{
use generic-service-bundled
hosts www1
service_description SMTP
check_command check_smtp
}
imagine 10 or 20 services to be monitored for every box, you get to configure all of them, for every host.
define service{
use generic-service-bundled
service_description HTTP
hostgroup_name HTTP
check_command check_http
}
define hostgroup{
hostgroup_name HTTP
}
define host{
use generic-host-bundled
host_name rerun.lefant.net
address rerun.lefant.net
parents odyssey.lefant.net
hostgroups HTTP,COLLECTD
}
then even grouping services works fine:
define service{
use generic-service-bundled
service_description SWAP-FREE
hostgroup_name SWAP-FREE
check_command check_collectd\
!swap/swap-free!value!209715200:!104857600:
}
define hostgroup{
hostgroup_name SWAP-FREE
hostgroup_members COLLECTD
}
(note the collectd nagios integration example here)
define service{
use generic-service-bundled
service_description LOAD
hostgroup_name LOAD
check_command check_collectd\
!load/load!midterm!5!10
}
define hostgroup{
hostgroup_name LOAD
hostgroup_members COLLECTD
}
(percentage needs patch, see end)
define service{
use generic-service-bundled
service_description DF-ROOT
hostgroup_name DF-ROOT
#check_command check_collectd\
#!df/df-root!free!601000000:!301000000:
check_command check_collectd_percentage\
!df/df-root!used!free!20:!10:
}
define hostgroup{
hostgroup_name DF-ROOT
hostgroup_members COLLECTD
}
more examples: lefants config templates in git
to compile cnagios / collectd
sudo aptitude install lex yacc automake autoconf \
libtool bison flex libltdl3-dev pkg-config \
libperl-dev libncurses5-dev
fixed collectd-nagios:
git clone git://git.verplant.org/collectd.git
git checkout 953bd0f881faa40c415a1f1a9d7e2da739d343ff
#or just use latest
my puny percentage patch:
playing with initial version of openvz monitoring via perl plugin:
know-how for this talk was in part sponsored by RevDev my current employer
questions?