some links beforehand

collectd

see yourself for more: collectd features

minimal collectd server config

#Hostname "localhost"
FQDNLookup true

LoadPlugin network
LoadPlugin rrdtool

<Plugin network>
        Listen "217.19.46.22"
        Listen "2002:d913:2e16::1"
</Plugin>

<Plugin rrdtool>
        DataDir "/var/lib/collectd/rrd"
</Plugin>

minimal collectd client config

#Hostname "localhost"
FQDNLookup true

LoadPlugin network
LoadPlugin cpu
LoadPlugin df

<Plugin network>
        Server "rerun.lefant.net"
</Plugin>

collectd webfrontend

enable using

sudo cp /usr/share/doc/collectd/examples/collection.cgi \
/usr/lib/cgi-bin/collection.cgi

and adapt webserver as needed.

demo url: http://rerun.lefant.net/cgi-bin/collection.cgi

just for fun:

nagios

webinterface statusmap

statusmap

webinterface services

services

more screenshots

notification email

***** Nagios *****

Notification Type: PROBLEM

Service: LOAD
Host: rerun.lefant.net
Address: rerun.lefant.net
State: CRITICAL

Date/Time: Thu Dec 4 17:20:48 CET 2008

Additional Info:

(Service Check Timed Out)

cnagios

nagios objects

nagios object overview

host example

define host{
  use        generic-host-bundled
  host_name  rerun.lefant.net
  address    rerun.lefant.net
  parents    odyssey.lefant.net
}

nagios docs for host objects

service example

define service{
  use                  generic-service-bundled
  hosts                www1,www2
  service_description  HTTP
  check_command        check_http
}

nagios docs for services

command example

very easy to extend, just put your script there or collectd-nagios (debian version buggy, see end)

define command{
  command_name  check_collectd
  command_line  /usr/local/bin/collectd-nagios \
    -s /var/run/collectd-unixsock -H $HOSTNAME$ \
    -n $ARG1$ -d $ARG2$ -w $ARG3$ -c $ARG4$
}
define command{
  command_name  check_collectd_percentage
  command_line  /usr/local/bin/collectd-nagios \
    -s /var/run/collectd-unixsock -H $HOSTNAME$ \
    -n $ARG1$ -g percentage -d $ARG2$ -d $ARG3$ \
    -w $ARG4$ -c $ARG5$
}

templates

define default properties via templates:

define host{
  name                          generic-host
  notifications_enabled         1
  process_perf_data             1
  retain_status_information     1
  retain_nonstatus_information  1
    check_command               check-host-alive
    max_check_attempts          10
    notification_interval       0
    notification_period         24x7
    notification_options        d,u,r
    contact_groups              admins
  register                      0
}

inheritance

this can also be inherited and customized:

define host {
  use generic-host
  name generic-host-bundled
  register                      0
  contact_groups                lefant
}

works fine for things that can be configured for hosts.

problem: service centric

however hosts cannot configure services to be monitored…

define host{
  use         generic-host-bundled
  host_name   rerun.lefant.net
  address     rerun.lefant.net
  parents     odyssey.lefant.net
  services    HTTP,SMTP         # doesn't work!!!
}

problem: service centric (2)

define service{
  use                  generic-service-bundled
  hosts                www1,www2
  service_description  HTTP
  check_command        check_http
}
define service{
  use                  generic-service-bundled
  hosts                www1
  service_description  SMTP
  check_command        check_smtp
}

imagine 10 or 20 services to be monitored for every box, you get to configure all of them, for every host.

pay your service hostgroup tax!

define service{
  use                  generic-service-bundled
  service_description  HTTP
  hostgroup_name       HTTP
  check_command        check_http
}
define hostgroup{
  hostgroup_name       HTTP
}

host config with taxes paid

define host{
  use                  generic-host-bundled
  host_name            rerun.lefant.net
  address              rerun.lefant.net
  parents              odyssey.lefant.net
  hostgroups           HTTP,COLLECTD
}

service hostgroup tax (continued)

then even grouping services works fine:

define service{
  use                  generic-service-bundled
  service_description  SWAP-FREE
  hostgroup_name       SWAP-FREE
  check_command        check_collectd\
    !swap/swap-free!value!209715200:!104857600:
}
define hostgroup{
  hostgroup_name       SWAP-FREE
  hostgroup_members    COLLECTD
}

(note the collectd nagios integration example here)

more nagios collectd service checks

define service{
  use                  generic-service-bundled
  service_description  LOAD
  hostgroup_name       LOAD
  check_command        check_collectd\
    !load/load!midterm!5!10
}
define hostgroup{
  hostgroup_name       LOAD
  hostgroup_members    COLLECTD
}

more nagios collectd service checks

(percentage needs patch, see end)

define service{
  use                  generic-service-bundled
  service_description  DF-ROOT
  hostgroup_name       DF-ROOT
  #check_command        check_collectd\
  #!df/df-root!free!601000000:!301000000:
  check_command        check_collectd_percentage\
    !df/df-root!used!free!20:!10:
}
define hostgroup{
  hostgroup_name       DF-ROOT
  hostgroup_members    COLLECTD
}

further links

more examples: lefants config templates in git

to compile cnagios / collectd

sudo aptitude install lex yacc automake autoconf \
libtool bison flex libltdl3-dev pkg-config \
libperl-dev libncurses5-dev

fixed collectd-nagios:

git clone git://git.verplant.org/collectd.git
git checkout 953bd0f881faa40c415a1f1a9d7e2da739d343ff
#or just use latest

further links (2)

my puny percentage patch:

playing with initial version of openvz monitoring via perl plugin:

final notes